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<N Abstract 

In this paper we consider a model for the spread of a stochastic SIR (Susceptible 
— > Infectious — > Recovered) epidemic on a network of individuals described by a ran- 
dom intersection graph. Individuals belong to a random number of cliques, each of 
random size, and infection can be transmitted between two individuals if and only if 
there is a clique they both belong to. Both the clique sizes and the number of cliques 
an individual belongs to follow mixed Poisson distributions. An infinite-type branch- 
ing process approximation (with type being given by the length of an individual's 
infectious period) for the early stages of an epidemic is developed and made fully rig- 
orous by proving an associated limit theorem as the population size tends to infinity. 
This leads to a threshold parameter R*, so that in a large population an epidemic 
with few initial infectives can give rise to a large outbreak if and only if > 1. 
A functional equation for the survival probability of the approximating infinite-type 
branching is determined; if i?* < 1, this equation has no non-zero solution, whilst, if 
-R* > 1, it is shown to have precisely one non-zero solution. A law of large numbers 
^vq for the size of such a large outbreak is proved by exploiting a single-type branching 

process that approximates the susceptibility set of a typical individual. 
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Traditional models for the spread of SIR (Susceptible — > Infectious — > Recovered) epi- 
demics [2, 15] are based on the homogeneous mixing assumption, that is, all pairs of indi- 
viduals in the population contact each other at the same rate, independently of each other. 
Generalizations of this model have been proposed by introducing household structure into 
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the population [4], where contacts between household members are more frequent than 
other contacts; by introducing a (social) network structure [1, 25], where contacts are only 
possible between pairs of individuals that share a connection in the network; or both [7, 8] . 
In most models for epidemics on networks, the network is modelled by a random graph 
constructed via the configuration model [23], [16, Chapter 3]. In this construction one can 
control the degree distribution of the vertices, but the resulting network is locally tree- 
like, in the sense that the network contains hardly any cliques (small completely connected 
groups) or short loops. In real social networks cliques are not sparse: 'the friends of my 
friends are likely to be my friends as well'. This feature of networks has been captured 
(among other models, such as those in [30, 27, 17]) by random intersection graphs, intro- 
duced in [22] and further studied in e.g. [11, 14, 34] (see [10] for a related model). Random 
intersection graphs might be seen as models for overlapping groups/cliques, in which a 
contact between two individuals is possible only if there is a group to which they both 
belong. These graphs are also known as random key graphs in computer science [21] and 
are related to Rasch models [32] in the social sciences. In our paper, and in most random 
intersection graph models in the literature, the resulting graph still has a tree structure, 
though now at the level of cliques. This structure allows for analysis, but arguably only 
captures some features of real (social) networks. It is possible to make the graphs more 
realistic by incorporating spatial location [19], but this makes the model intractable for our 
purposes. 

The aim of this paper is to study SIR epidemics on random intersection graphs. Specif- 
ically, we use branching process approximations to derive (i) a threshold parameter R*, 
which determines whether an epidemic with few initial infectives can become established 
and infect a non-negligible proportion of the population, an event we call a large outbreak; 
(ii) the probability that a large outbreak occurs; and (iii) the fraction of the population 
that is infected by a large outbreak. These approximations are made fully rigorous as the 
population size tends to infinity by proving associated limit theorems. 

The only previous rigorous study of epidemics on random intersection graphs is [11]. 
We extend the analysis of [11] in three directions. First, we allow more general distributions 
for both group size and the number of groups a typical individual belongs to. In [11], both 
of these quantities follow Poisson distributions; here we allow them to follow mixed-Poisson 
distributions. Moreover, as discussed in Section 6, we expect similar results to hold when 
they both follow quite general distributions, though our proofs are valid only for the mixed- 
Poisson case. Secondly, we allow for an arbitrary infectious period distribution, unlike 
in [11] where a Reed- Frost type model [2, Section 1.2] (which effectively has a constant 
infectious period) is used. Thirdly, we give a formal proof of a law of large numbers for 
the final outcome of a large outbreak, a result that was conjectured but not proved in [11]. 
Introducing variable infectious periods significantly complicates the analysis. We note that 
for random infectious periods, our model is not covered by [10, Section 5], since we need 
directed inhomogeneous random graphs and the proofs in [10] rely heavily on the structure 
of undirected graphs. Therefore, we need to develop alternative techniques to determine 
the fraction of the population that is infected by a large outbreak. 

The remainder of the paper is organized as follows. Section 2 gives a brief introduction 
to random intersection graphs and SIR epidemics defined upon them. The main results 
of the paper, together with associated heuristic explanations, are given in Section 3. In 



2 



particular, in Section 3.2 we show how the early stages of an epidemic in our model can be 
approximated by a multitype (forward) branching process (whose type space is in general 
uncountable), yielding a threshold parameter i?* and the approximate probability of a 
large outbreak. In Section 3.3, a single-type (backward) branching process, which enables 
the proportion of the population that is infected by a large outbreak to be determined, 
is described. The key limit theorems of the paper are stated in Section 3.4. They show 
that, if there are few initial infectives, then in a large population: (i) a large outbreak 
can occur only if the forward branching process is supercritical; (ii) the probability that 
a large outbreak occurs is close to the probability that the forward branching process 
survives; and (iii) if there is a large outbreak, then the proportion of the population that 
is infected by the epidemic is close to the survival probability of the backward branching 
process. The forward multitype branching process is studied in Section 4, where it is 
shown that the process survives with non-zero probability if and only if > 1 and that 
the survival probability may be obtained using a functional equation, which, as is proved in 
Appendix A, has at most one non-zero solution. The limit theorems corresponding to the 
forward and backward branching processes are proved in Sections 5.1 and 5.2, respectively. 
Extension to more general distributions of clique size and the number of groups a typical 
individual belongs to is discussed briefly in Section 6. Explicit expressions, in terms of 
Gontcharoff polynomials, for and for the probability generating function(als) of the 
offspring distributions of the backward and forward branching processes (which enable the 
survival probabilities of these processes to be computed) are derived in Appendix B. 



2 Epidemics and random intersection graphs 
2.1 Notation 

Throughout, N denotes the set of natural numbers not including 0, while Z + = NU {0}. 
For x > 0, L^J — max(?/ G Z + : y < x) is the floor of x, and \x] = min(y G Z + : y > x) is 
the ceiling of x. 

Furthermore, we write 



and 



A (directed or undirected) graph is simple if it contains no parallel edges (edges that share 
both end- vertices) or self- loops (edges with only one end- vertex). In a directed graph, 
edges are parallel if they share both end-vertices and have the same direction. In a multi- 
graph self-loops and parallel edges are allowed. We may construct a directed graph from an 
undirected one by replacing every undirected edge by two directed edges with the same end- 
vertices but having opposite directions. If we construct a simple graph from a multi-graph, 
we do this by merging parallel edges and removing self-loops. 

We use P for general unspecified probability measures, for which the interpretation 
is clear from the context, and E for the associated expectation. We use Ex to denote 
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= 0(g(x)) 


if 




\ = o(g(x)) 
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= Q(g(x)) 
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expectation with respect to the random variable X. However, if no confusion is possible 
we sometimes drop the subscript. For the non-negative random variable X, a mixed- 
Poisson(X) random variable, Y, is defined by P(Y = k) = Ex[4re~ x ], for k G Z + . We say 
that a random variable is V(x) if it is Poisson distributed with mean x and AiV(X) if it 
has a mixed-Poisson(X) distribution. We use X to denote the size-biased variant of the 
non-negative random variable X, so, provided E[X] G (0, oo), for x > we have 



1 S ] - E[X] ~ E[X] 



(2.1) 



Here 1(^4), is the indicator function of A, which is 1 if A holds and otherwise, and we 
assume that X is not almost surely 0. Note that if Y ~ MV(X), then Y ~ MV(X) + 1; in 
this situation we use the notation Y to denote a random variable with the same distribution 
as Y - 1, so that if Y ~ MV(X), then Y ~ This implies that E[Y] = E[X]. 

Let X n =>■ X denote convergence in distribution. By [18, Theorem 7.2.19] we know that if 
X n => X, then E[X n l(X n < x)) -»■ E[X1(X < x)) for all^points^of continuity of P(X < x). 
This implies that if E[X n ] -> E[X] and X n X, then X n =^ X. 

We also use the notation fx(s) = E[s x ] (s G [0,1]) for the probability generating 
function of a Z + -valued random variable X and 4>x{Q) = E[e~ ex ] {9 > 0) for the moment 
generating function of a real-valued random variable X. Note that if Y ~ AiV(X) then 
E[Y] = E[X] and fy(s) = 4>x(^ ~ s )- Lastly, for any set A we denote its cardinality by \A\. 



2.2 Random intersection graphs 

We consider a variant of random intersection graphs [11, 14, 22] constructed via a bipartite 
generalization of Norros and Reittu's Poissonian random graph model [28]. Random inter- 
section graphs may be thought of as random graphs composed of overlapping groups/ cliques 
of individuals/vertices. We note that the model introduced in [22] is more general than 
(the equal-weight variant of) the model presented in this paper. 

We construct a sequence of random intersection graphs as follows. Consider two infinite 
sets of vertices V = (vi,i G N) and V = (v'j,j G N). Fix a real number a > 0. Assign 
independent and identically distributed (i.i.d.) weights (A iy i G N) to the vertices in V, 
all distributed as the non-negative random variable A and, independently, i.i.d. weights 
(Bj,j G N) to the vertices in V, all distributed as the non-negative random variable B. 
Assume that 

H = E[A) = aE[B) G (0, oo). (2.2) 

Define 

n 

i=l 
\an\ 

L'^ = Y,B V (2.4) 

i=i 

though see Remark 2.3 below. Let (f2, J 7 , v) be the corresponding probability space, where 
Q = (R + ) N x (]R + ) N is the product space of non-negative real-valued infinite sequences 
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Figure 1: Construction of from A^ n \ 

(Ai,i G N) and {Bj,j G N). The ex-field J 7 is generated by the finite dimensional cylinders 
on Q and v is the appropriate (product) measure determined by the distributions of A 
and B. We note that, by the strong law of large numbers, both L^ n '/(fin) — ^ 1 and 
as n — y oo. Here — -> denotes almost sure convergence with respect to the 

measure v. 

For given u G f2, an auxiliary sequence of random undirected multigraphs (A^ n \n G 
N) = (A^ n '(u),n G N) is constructed as follows. For each n, the vertex set of A^ n ^ consists 
of y(»») = 1 < i < n ) and V'W = (vj., 1 < j < [cmj). Vertices ^ G and ^ G K /(n) 
share a V(AiBj/(fj,n)) number of edges (see Remark 2.1). Conditioned on the weights of 
vertices, i.e. on u, the numbers of edges between distinct pairs of vertices are independent 
and there is no edge in A^ connecting vertices either both in V^ n > or both in V'( n \ Note 
that in A( n ), the degree of vertex Vi G is V(A\ ) with 

4 (n) = AiL'^/iiin) ^Ai as n -> oo, (2.5) 
while the degree of vertex v] G V'^ is P(5j n) ) with 

Bf ] = BjL^/ijm) ^ Bj as n -> oo. (2.6) 
The random variables A*™) and B^ are defined by 

P(A (n) <x) = n- l \{l <i<n: Af ] < x}\, (x > 0) and (2.7) 

¥(B (n) < x) = [an^Hl < j < [an\ : B^ n) < x}\, (x > 0). (2.8) 

Thus, AW(w) and 5^ (u) are random variables with the empirical distribution of the 
rescaled weights {^4- n ' ) } and {B^}, respectively. By the strong law of large numbers, 
AW A and B^ ^Basn^oo. 

For the purpose of this paper it is not important how the graphs in the sequence depend 
on each other. For simplicity we assume that, conditioned otiu= (Ai, ieN)x G N), 

the graphs (A^ n \n G N) are independent. 

The vertices of the random intersection graph are precisely those in V^. Two 
(distinct) vertices share an edge in G^ if and only if there is at least one path of length 
2 between them in A^ n ^. Thus, G^ is a simple graph. This construction is visualized 
in Figure 1. We note that is slightly different from an ordinary random intersection 
graph. In [11, 14] the conditional probability that vertices with weights Ai and Bj share 
an edge in is given by mm(l,A i Bj/(fxn)), as opposed to 1 — exp[— AiBj/([j,n)] in this 
paper. 
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Remark 2.1. Of course it is possible to construct a simple version of the (multi) graph A^ 
directly, in which the vertices u$ and u'- share an edge with probability 1 —exp[—AiBj/(fj,n)]. 
Indeed, this is sufficient to describe the population structure of our model. We use the 
present construction, where Vi and Vj share a Poisson distributed number of edges, in order 
to have the machinery ready for branching process approximations. 

Remark 2.2. The graph G^ n > is a graph of overlapping cliques, in which, asymptotically 
as n — > oo, the number of cliques a vertex is part of has an AiV(A) distribution and the 
clique sizes have an A4V(B) distribution. Both of these distributions have finite mean by 
assumption. 

Remark 2.3. Since the random intersection graph does not change if, for some r e (0, oo), 
the random variables A and B are replaced by rA and B/r, condition (2.2) might be replaced 
by K[A] < oo and K[B] < oo but this does not gain any generality. The linear scaling 
IV^l = [a\V^ n,) \\ is assumed in order to guarantee that, as n — )■ oo, (i) clique sizes do not 
grow to infinity, and (ii) two (or more) cliques contain at most one common vertex, with 
high probability. 

Remark 2.4. In this paper we make use of the following equivalent way of constructing 
A^ n \ Initially all vertices are unexplored. Pick a vertex from V^ n > according to some law 
(e.g. uniformly at random), say vertex Vi, which has weight A4; this vertex becomes active. 
Assign a V(A[ ) number of edges to it (see (2.5)). The end-vertices in V'^ of these edges 
are chosen independently with replacement and the probability that v'j is chosen is Bj/L'( n \ 
After this vertex Vi is made explored, while the chosen vertices become active. 

Now, if there are any, explore the active vertices from V r/(n * ) one by one. Suppose that 
we explore vertex v'j, which has weight Bj; then assign a V(B^) number of edges to it. 
These edges connect to vertices chosen independently, with replacement, from V^ n ' ; vertex 
vi being chosen with probability Ai/L^ . If the end vertex has already been explored then 
the edge is ignored and not added to the graph, otherwise it is added and the end vertex in 
becomes active. If all the edges from v'j are drawn, then v'j is made explored. 

The next step is to pick one of the active vertices from V^ n ' , if there are any, according 
to some, for now unspecified, law and explore it. Say that we choose Vk, which has weight 
Ak- Then we proceed as in the first step. We assign a V(A^) number of edges to it, then 
the end-vertices in V'^ of these edges are chosen independently with replacement and the 
probability that v'j is chosen is Bj/L'( n \ If the end vertex has been explored before, then the 
edge is ignored and deleted. After this, vertex Vk is made explored and the newly chosen 
vertices in V'^ which are unexplored become active. We now explore all active vertices 
in y( n ) in turn, and so on until there is no active vertex left. After that an unexplored 
vertex from is chosen and the process goes on until all vertices in V^ n > are explored. 
Note that if after this construction there are unexplored vertices left in V'^ n \ they will have 
degree 0, since there is no end-vertex left in to connect to. 

2.3 SIR epidemics 

We consider a stochastic SIR epidemic on the random intersection graph G^ n \ The vertices 
of the graph correspond to individuals and the edges to relationships/possible contacts. We 
assume that initially there is one infectious individual/vertex, chosen uniformly at random 
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from the population, while all other individuals are susceptible. Every individual, indepen- 
dently of other individuals, makes (directed) contact with each of its neighbours in G^ n ' 
at the points of independent Poisson processes of unit intensity. If an infectious individ- 
ual contacts a susceptible one, the susceptible becomes infectious. Infectious individuals 
stay infectious for a random infectious period, distributed as X, after which the infectious 
individual recovers and plays no further part in the epidemic. Infectious periods are i.i.d. 
and independent of the Poisson processes generating the contacts. An infectious contact is 
a contact by an infectious individual, irrespective of the state of the receiving individual. 
Note that there is no loss of generality in assuming that the intensity of the Poisson pro- 
cesses governing the contacts is 1, since this can always be achieved by rescaling time. We 
denote the above epidemic model by £^ n \A, B,I). 

For ease of exposition, primarily to avoid multitype branching processes that are re- 
ducible, we assume that P(X = 0) = 0. We omit the details but our results are readily 
extended to the case P(X = 0) > 0. Note, however, that we do allow for the possibility that 
P(X = oo) > 0; if an infectious individual has infinite infectious period then, almost surely, 
that individual makes infectious contact with every member of each clique it belongs to. 

In order to study properties of the epidemic on a graph, G say, we introduce the Epi- 
demic Generated Graph, which is a directed graph constructed as follows. If G is undirected 
then make it directed by replacing every edge by two edges connecting the same vertices 
but in opposite directions. Assign every vertex i in G an independent realisation, Xi, of 
the random variable X. Now thin G by deleting, independently, each edge emanating from 
vertex i with probability e~ Xi . Thus an edge starting at V{ is deleted if infection would 
not pass along it were V\ to become infected during the epidemic. The set of vertices 
that can be reached in the Epidemic Generated Graph from an initially infectious vertex 
v (including vo itself) is distributed as the set of ultimately recovered individuals. The 
set of vertices from which there is a path in the Epidemic Generated Graph to vertex vq, 
including t> itself, is said to be the susceptibility set of t>o [3, 5]. If one of the vertices in 
the susceptibility set of v is the initially infectious individual, then v will be ultimately 
recovered in the epidemic. 

3 Main results and heuristics 
3.1 Introduction 

In this section we outline the main results of the paper, together with their heuristic 
explanations. In Section 3.2, we explain how the early stages of an SIR epidemic on a 
random intersection graph may be approximated by a (forward) branching process, yielding 
a threshold parameter (see (3.1)) for the epidemic and the approximate probability 
that such an epidemic becomes established when the population size n is large. Unless the 
infectious period X is constant, this branching process is multitype, its type space being 
the support of X and hence in general uncountable. This infinite type branching process 
is studied separately in Section 4. In Section 3.3, we show how the susceptibility set of 
an individual may be approximated by a (backward) branching process, which is single- 
type even if X is not constant. Furthermore, we explain why, if n is large, the proportion 
of the population that is ultimately infected by an epidemic that becomes established is 
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approximately the probability that the backward branching process avoids extinction. The 
above approximations are made fully rigorous by considering SIR epidemics on a sequence of 
random intersection graphs, indexed by the population size n, and proving associated limit 
theorems. These theorems are stated in Section 3.4 and proved in Section 5. Calculation 
of extinction probabilities for the forward and backward branching processes requires exact 
results concerning the final outcome and susceptibility sets for standard SIR epidemics in 
closed homogeneously mixing populations, which are given in Appendix B. 

3.2 Early stages of an epidemic 
3.2.1 Fixed infectious period 

Consider the epidemic model £^ n ' (A, B, X) defined in Section 2.3 and, for simplicity, suppose 
first that the infectious period is constant, i.e. there exists i > such that P(X = l) = 1. In 
the limit as the population size n — > oo, the initial infective, i* say, belongs to X ~ A4V(A) 
cliques, having sizes Y\ + 1, Y 2 + 1, • • • ,Y X + 1, where, given X, the random variables 
Ki,y 2 , ••• ,Y X are mutually independent and (Yi\X) ~ MV(B) (i = 1,2, ••• ,X). The 
size biasing comes in because the probability of being part of a clique is proportional to 
its weight. Moreover, apart from i*, these cliques are almost surely disjoint as n —¥ oo. 
The initial infective will trigger a local (within-clique) epidemic in each of the X cliques it 
belongs to. The group of initial susceptibles in a single clique that are infected through a 
local epidemic started by i* is called a litter of i*. (Note that a litter may be empty, i.e. if 
no susceptible in the corresponding clique is infected.) Let T(m) denote the size of a litter, 
not counting the initial infective i*, given that the clique has size m + 1. (We call T(m) 
the size of a local epidemic or the size of a litter.) Then the total number of individuals 
infected (excluding i*) by the local epidemics in the cliques that i* belongs to is distributed 
as 

x 

C f = ^T{Yi), 

i=l 

where T(Yi) ) T(Y 2 ) 1 ■ ■ ■ ,T(Y X ) are independent, since the infectious period is constant. 

Now consider a typical individual, j* say, that is part of one of the litters of i*. In the 
limit as n — )• oo, (i) individual j* belongs to X ~ AiV(A) cliques, in addition to the clique 
j* was infected through (i.e. the one also containing i*), having sizes distributed indepen- 
dently as MV{B) + 1 and (ii) apart from j*, the X + 1 cliques containing j* are disjoint. 
(The size biasing here arises because, in the construction of G^ n \ the probability that a 
vertex joins a given clique is proportional to the weight of that vertex; see Remark 2.4.) 
Individual j* will trigger a local epidemic in each of the X 'new' cliques it belongs to. The 
total number of individuals infected (excluding j*) in these X local epidemics (the sum of 
the sizes of the litters of j*) is distributed as 

x 

i=l 

where, given X, the random variables T(Yi), T(y 2 ), • • • , T(Y X ) are independent. 

The construction of the epidemic process may be continued in the obvious fashion. It 
follows that, if the population size n is large, the number of infected individuals in the early 
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stages of the epidemic process may be approximated by a (Galton- Watson) branching pro- 
cess, with one initial ancestor, and offspring distribution that of in the initial generation 
and of C* in all subsequent generations. This approximation is made precise by using a 
coupling argument in Section 5.1. The coupling between the epidemic and branching pro- 
cesses breaks down when a clique used to spread a local epidemic intersects a previously 
used clique, which, with probability tending to one as n — > oo, happens if and only if the 
branching process does not go extinct. 
Let 

R, = E[C f ] = E Y [E[T(Y)\Y}]E[X] = Ey[E[T(Y)\Y}}E[A] (3.1) 
and, for s G [0, 1], let 

f C f(s) = E[s cf ] = f x (E Y [f T(nY (s)]) 

and 

f df ( S ) = E{s df } = MEylf^yis)]). 

Let p be the survival probability of the above branching process (i.e. the probability that 
it does not go extinct). Then, by standard branching process theory [20], if R* < 1 then 
p = and if R* > 1 then 

p=l-f cf (a), (3.2) 
where a is the unique solution in [0, 1) of the equation 

fcf(s)=s. (3.3) 

The coupling of the epidemic and branching processes mentioned above implies that, if the 
population size n is suitably large, R* is a threshold parameter for the epidemic process 
and the probability that an epidemic initiated by a single infective becomes established 
and leads to a major outbreak is given approximately by p. Note that in [11], the notation 
Ro is used instead of R*. We use the notation of [7, 8], because Ro is usually defined as 
the expected number of new direct infections caused by an infectious individual in the first 
stages of an epidemic [2, 15, 29], while in (3.1) all individuals infected by a local epidemic 
are 'assigned to' the initial infectious individual in the clique. 

3.2.2 General infectious period distribution 

When the infectious period is not constant we can still approximate the epidemic £^ n \A, B,I) 
by considering successive local epidemics as above, but the approximating process is no 
longer a simple single-type branching process. There are two reasons for this. First, the 
sizes of the litters of an individual, i* say, are not independent since the infectious period 
of the initial infective in the corresponding cliques is the same (i.e. the infectious period 
of i*). Secondly, the infectious periods of infectives in a litter are not independent of the 
size of that litter. These difficulties may be overcome by considering a multitype branching 
process, in which individuals are typed by the length of their infectious period. If the 
infectious period X has finite support then standard finite-type branching process theory 
(see e.g. [20, Chapter 4]) may be used, so we now assume that X has infinite (possibly 
uncountable) support. 
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In view of these observations, we approximate the early stages of the epidemic £^ (A, B, X) 
by a multitype branching process 

Z f = Z f (A,B,I) = (Zl,ieZ+), 

defined as follows. The type space is (0, oo], with the type of an individual being given by 
the infectious period of the corresponding individual in the epidemic process. For i £ Z + , 
Z( is a multiset of points in (0, oo] giving the types of individuals present in generation 
i of the branching process. (Note that if the distribution of X has atoms, at infinity or 
otherwise, then z( may contain repeated elements; on the other hand if the distribution 
of X is continuous then, almost surely, all elements of Z( are distinct and hence Z{ is 
a set.) There is one initial ancestor, corresponding to the initial infective, i* say, in the 
epidemic £^(A, B,X) and its type is distributed as X. As in the constant infectious period 
case, i* belongs to X ~ AiV(A) cliques, having sizes distributed independently as Y + 1, 
where Y ~ AiV(B), and in Z% the offspring of the initial ancestor corresponds to all the 
individuals infected in the local epidemics triggered by i* in these X cliques, though now 
of course we also keep track of their types (infectious periods). In the branching process, 
a group of children corresponding to a litter in the epidemic process is also referred to as 
a litter. The offspring of any individuals in a non-initial generation of Z* are defined in a 
similar fashion, except X is replaced by X ~ AiV(A). Of course, the offspring of distinct 
individuals in Z* are mutually independent. 

The branching process Z* , which we call a forward branching process because it ap- 
proximates the forward spread of the epidemic £^ n '(A, B,X), is analysed in Section 4. Let 
Z* be the multitype branching process defined analogously to Z* , except the offspring 
distribution in all generations of Z' is that of the non-initial generations in Z* . Let p be 
the probability that Z* survives and, for x £ (0, oo], let p(x) be the probability that Z* 
survives given that the ancestor has type x. Let R* be defined as in (3.1), where T(m) 
is distributed as the size of a local epidemic, initiated by a single infective in a clique of 
size m + 1, in which the infectious periods of infectives (including the initial one) are i.i.d. 
copies of X. (An expression for Ey[E[T(F)|F]] is given by equation (B.7) in Appendix B.2, 
thus enabling R* to be computed.) Then p > if and only if R* > 1 (see Theorem 4.2), 
so R* is still a threshold parameter for the epidemic. Also, when R„ > 1, p is given by an 
infinite-type analogue of (3.2); see (4.4), which expresses p as the expectation of a func- 
tional of p with respect to the distribution X of x. Furthermore, p satisfies a functional 
equation (see (4.3)), which is essentially an infinite-type analogue of (3.3) and has at most 
one non-zero solution (see Lemma 4.1). 

3.3 Final outcome of an epidemic 

Recall the definition of the susceptibility set of an individual given in Section 2.3. We 
require also the concept of a local susceptibility set, which is defined in exactly the same 
way as a susceptibility set but for an epidemic on a single clique. For m = 0, 1, ■ • • , let 
S(m) denote the size of a typical local susceptibility set of an individual in a clique of size 
m + 1, where S(m) does not include the individual itself. 

We may approximate the early growth of a susceptibility set of an individual, i* say, by 
a branching process in much the same way as we did for the early stages of an epidemic. We 
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consider first those individuals, not including i* itself, who belong to a local susceptibility 
set of i*. These are the offspring of i* in the branching process. We next repeat this process 
for each individual, j* say, in the first generation of the branching process to obtain the 
second generation, and so on. This leads to a (backward) branching process 

Z b = Z b (A,B,l) = (Z b ,ieZ+) 

having one initial ancestor, in which the number of offspring of the ancestor is distributed 

as 

x 
i=i 

and the number of offspring of any subsequent individual is distributed as 

x 

i=i 

where X,X,Yi,Y 2 ,--- are independent, X ~ MV(A), X ~ MP (A) and Y t ~ MP(B) 
(i = l,2,-..)- 

Note that the local susceptibility set of an individual is independent of its infectious 
period, so Z h is a single-type branching process; thus Z\ is determined by its cardinality 
\Z\\i in contrast to Z[ (which is single- type only if X is almost surely equal to a fixed 
constant). 

Let 

R\ = E[C b ] = E Y [E[S(Y)\Y}]E[A] (3.4) 

be the mean number of children of an individual in Z b who is not the ancestor and, for 
s G [0, 1], define the probability generating functions 

f Cb ( S ) =E[s Cb ] = fx(Ey[f s{n y( S )]) 

and 

f db (s) =E[ S c b ] = fx(E Y [f s(Y)lY (s)]). 

Denote by p b = p b (A, B,I) the survival probability of Z b . Then, by standard branching 
process theory, if Rl < 1 then p b = and if B% > 1 then 

p b = l-fc>(0, (3.5) 

where ^ is the unique solution in [0, 1) of the equation 

f db (s) = s. (3.6) 

Note that an expression for E Y [fs(Y)\Y( s )] * s gi ven by equation (B.8) in Appendix B.2, 
which enables p b to be computed. In connection with this computation, also recall that 
fx(s) = 0a(1 — s) and observe that f x ( s ) — ~ s ) = ~ $4.(1 — S )/IE[A], where <p' A is 
the derivative of 0a- 

Before describing how the backward branching process Z b is used to study the final 
outcome of an epidemic in a large population; we briefly discuss the relationship between 
the forward and backward branching processes. In particular we note two important con- 
sequences of this relationship. 
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Remark 3.1. Let G' be the Epidemic Generated Graph (see Section 2.3) for an epidemic 
on a single clique (G say) ofm+1 individuals, labelled 0, 1, • • • ,m. For distinct i,j e 
{0, 1, ■ ■ ■ ,m}, let Xi,j — 1 if there is a chain of directed edges from i to j in G' and 
let Xi,j = otherwise. Then T{m) and S{m) are distributed as Y^iL\ Xo,i an d YlT=i Xi,o, 
respectively, so by symmetry, E[T(m)] = mP(xo,i = 1) and K[S(m)} = mF(xi,a = 1)- 
Further, by symmetry, P(xo,i — 1) — P(xi,o = 1); and it follows from (3.1) and (3.4) that 
R* = R* ■ Thus we use only the notation i?* . 

Remark 3.2. Consider the graphs G and G' of the previous remark, and suppose that the 
infectious period X is constant, say P(I = i) — 1. Then G' is obtained from the directed 
version of G by deleting directed edges independently, each with probability e~ L . Thus, 
if G" is obtained from G' by reversing the direction of all arrows, then G" and G' are 
identically distributed, whence so are T(m) and S(m). It follows that in this case p b = p. 
This argument breaks down when I is not constant. In that case, apart from the branching 
process Z* being multitype, the directed edges from a given vertex in G' are not independent, 
whence T(m) and S(m) have different distributions. Thus generally p b ^ p. 

Now we describe the relationship between the backward branching process and the final 
outcome of an epidemic. Consider the epidemic model £^ n \A, B,I) and suppose that the 
population size n is large. Choose an initially susceptible individual uniformly at random 
from all initial susceptibles, j say, and construct its susceptibility set on a generation basis 
as described above for Z b . Stop this construction when the total size of the susceptibility set 
becomes greater than logn or when the susceptibility set process goes extinct, whichever 
occurs first. The susceptibility set process can be coupled to the backward branching 
process Z h so that, with probability tending to 1 as n — > oo, the two coincide whilst their 
sizes are not greater than logn. Also, the probability that the total progeny of Z b is greater 
than logn tends to p b as n — > oo. 

By symmetry, the initial infective in S^(A, B,T), i say, may be chosen by picking an 
individual uniformly at random from the population excluding j. Thus, if j's susceptibility 
set process goes extinct before reaching size logn then the probability that j's susceptibility 
set contains the initial infective (and hence that j is ultimately infected by the epidemic) 
tends to zero as n — > oo. Suppose instead that j's susceptibility set process does reach 
size logn. Then we choose the initial infective % as above, construct the forward epidemic 
process from % and determine whether or not the latter intersects the logn individuals in 
j's partially constructed susceptibility set. If it does then j is ultimately infected by the 
epidemic, otherwise j remains uninfected. 

Recall that the forward epidemic process originating from i is approximated by the 
branching process Z' . If Z* goes extinct then, in the limit as n — > oo, there are only finitely 
many individuals infected in the epidemic and hence the probability that the epidemic 
intersects j's partially constructed susceptibility set tends to zero. If Z' does not go extinct 
then, by exploiting a lower bounding branching process for the epidemic process, we show 
in Section 5.2 that, as n — > oo, the epidemic process almost surely infects G(n) individuals 
and hence the probability that it intersects j's partially constructed susceptibility set tends 
to one. 

The above implies that the asymptotic probability that an initial susceptible, chosen 
uniformly at random, is ultimately infected by a major outbreak is p b . Hence the asymptotic 
expected proportion of the population ultimately infected by a major outbreak is also p b . 
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Now consider two distinct initial susceptibles chosen uniformly at random, ji and ji say, and 
construct their susceptibility sets on a generation basis as above, stopping each process if its 
size reaches logn or if the process goes extinct. The two partially constructed susceptibility 
set processes are asymptotically independent as n — > oo, which enables a weak law of large 
numbers to be proved for the proportion of the population that is ultimately infected by a 
major outbreak. 



3.4 Limit theorems for SIR epidemics on random intersection 
graphs 

Let = lZ( n \A, B,X) be the set of ultimately recovered vertices, including the sin- 

gle initial infective, in the SIR epidemic £^ n \A, B,Z) on the random intersection graph 
G^ n \ constructed using the infectious period distribution X and the sequences (A^i G N), 
(Bj, j G N) (as described in Section 2.2). Our focus is on the properties of |7^ < - n ^|, the num- 
ber of ultimately recovered individuals in the epidemic. For a branching process, say, let 
\Zf\ = \^(\ denote its total size (total progeny), including the ancestor. Recall that 

= Z-f (A, B,X) and Z b = Z b (A, B,X) are the (forward and backward) branching pro- 
cesses, which approximate the epidemic process and the process exploring a susceptibility 
set, respectively. Recall also that p and p b are their respective survival probabilities. 

Our first theorem establishes the precise sense in which the forward process approxi- 
mates the early stages of an epidemic. 

Theorem 3.3. For all fcGN, 

lim P(|ft (n) | = k) = F{\Z f \ = k). 



n— >oo 



The next result establishes the connection between the backward process and the pro- 
portion of individuals ultimately infected. 

Theorem 3.4. For every < e < p b , 



lim P 

n— Kx> 



n 



p b 



< e = p. 



Theorems 3.3 and 3.4 are proved in Sections 5.1 and 5.2, respectively. Finally, we use 
these two results to establish the following convergence in distribution of the proportion of 
individuals ultimately infected in the epidemic process. 

Theorem 3.5. Let T F be a random variable with P(Tp = p b ) = p = 1 — P(Tp = 0). Then, 
as n — > oo, 

n - l \n {n) \ ^t f . 

Proof. First note that Theorem 3.3 implies that, for any e > and any fc£N, 

liminf P (n^lll^l < e) > F(\Z f \ < k), 

whence, letting k — \ oo, 

liminf P {pT x \ll {n) \ < e) > 1 - p. (3.7) 
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Suppose that R* < 1. Then p = and (3.7) implies that 

n -1 |ft (n) | =>0 asrwoo. (3.8) 

On the other hand, suppose that i?* > 1, so p > 0. Then Theorem 3.4 implies that, for 
< e < p b , lim sup^oo P (n -1 |7?.( n )| < e) < 1 — p, which, together with (3.7), yields that, 
for such e, 

lim F(n- l \K {n) \ < e) = 1 - p. 
The theorem then follows upon combining this observation with (3.8) and Theorem 3.4. □ 

4 Properties of the forward branching process 

In this section we study the survival probability of the branching process Z* introduced 
in Section 3.2. Recall that individuals in Z? are typed by the length of the infectious 
period of the corresponding individual in the epidemic process. There is one ancestor, i* 
say, whose type is distributed as X and who belongs to X ~ AAV(A) cliques. (That is, 
the corresponding individual in the epidemic process £^(A, B,I) belongs to X ~ AAV(A) 
cliques.) Those cliques have sizes that are independent and identically distributed as 1+Y, 
where Y ~ AAV(B). The offspring of the ancestor correspond to the individuals who are 
infected, in the corresponding epidemic process, by the local epidemics triggered by i* in 
the X cliques it belongs to. The offspring of i* are grouped into litters with each litter 
corresponding to a clique of i*. Note that some litters might be empty (if the epidemic fails 
to spread further into some cliques to which i* belongs). The offspring of any subsequent 
individual is defined similarly, except that such an individual belongs to X ~ AAV (A) 
cliques in addition to the clique it was infected through. The type space for Z^ is given by 
the support of X, which is a subset of (0, oo]. For ease of exposition, we assume that X has 
support (0, oo]; extension to other cases is straightforward. 

We investigate the survival probability of Z' using functionals defined on measurable 
test functions h : (0, oo] — > [0, 1] as follows (cf. [9, 10]). Let h(x) be a given test function. 
Suppose that individuals in Z> are marked independently, with an individual of type x 
being marked with probability h(x). Let F(h)(x) be the probability that an ancestor of 
type x has at least one marked child in a given litter and let be the probability that 

an ancestor of type x has at least one marked child. Recall that the probability generating 
function of X is given by fx( s ) — 0a(1 ~ s ) ( s ^ [0, 1]), where 4>a(Q) — E[e _eA ] is the 
moment generating function of A. It follows that 

^(h)(x) = l-MF(h)(x)). (4.1) 

Define the functional 3>(h)(x) similarly for the branching process Z* , defined in the final 
paragraph of Section 3.2.2; thus 

$(h)(x) = l-<j> A {F{h){x)). (4.2) 

Let pi be the probability that generation i of the branching process zf is non-empty, 
that is pi = P(|i>/| > 0). By definition pi is non-increasing, so p = lim^ooPj exists 
and is the probability of survival of the branching process. Let pi(x) be the probability 
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that the lineage of an individual (i.e. the sub-process consisting of that individual and all 
its descendants), which is not the ancestor and has type x, survives for at least i further 
generations and let p(x) = lim^oo Pi(x) be the probability that this lineage survives forever. 
Note that p\{x) = $(l)(x), where 1 is the function which is equal to 1 on its entire domain. 
It is clear that p(x) satisfies 

p(x) = $(p)(x), (4.3) 

since in order for the lineage of an individual to survive, at least one of the children of that 
individual must have a surviving lineage. Furthermore, 

p= f $(p)(x)P(JedaO=E[$G5)(2)]. (4.4) 

Let $j be the z-th iterate of $ and note that p~i(x) = $j(l)(x). The functionals <&(h)(x) 
and <&(h)(x) are monotonic increasing in h(x). Therefore, p(x) = lim^oo $j(l)(x) is the 
pointwise maximal solution of (4.3). Note that, since Z* is irreducible, either p{x) = for 
all x G (0, oo] or p{x) > for all x G (0, oo]. The following lemma is proved and discussed 
in Appendix A. 

Lemma 4.1. There is at most one non-zero solution p(x) of (4.3). 

Now recall the definition of R* from (3.1), where Y ~ M.V(B) and as before let T(m) 
denote the size of a litter, in a clique of m initial susceptibles, in which the infectious periods 
of infectives are i.i.d. copies of X. It is convenient here to show explicitly the dependence 
on X and write T(m) = T(m,X), so 

R, = Ey[E[T(Y,l)\Y]]E[A}. 

Theorem 4.2. The survival probability satisfies p > if and only if R* > 1. 

Proof. Suppose first that R* > 1. For k G Z+, let L(k,l) = E[T(Y,X)\Y = k}. Then there 
exists K G N such that 

K 

E[A] L{k, X)P(y = k)>l. (4.5) 

fc=0 

For e > 0, let X e be the discrete random variable obtained from X by X e = e|X/eJ (with 
the convention that [ooj = oo) and note that X e is stochastically smaller than X. Since 
L(k,X) depends on the realisation of an Epidemic Generated Graph defined on a finite 
clique, there exists e > such that 

K 

E[A}^2L(k,I t )F(Y = k) > 1. 

fc=0 

Analagously to the derivation of (4.5), there exists K' t G N such that for T' e 
(K' t , oo)), we have 

K 

E[A]J2L(k,l' t )F(Y = k)>\. 

k=0 



= x £ i(x £ g 

(4.6) 
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Consider the branching process Z' (A, B,X' e ), which has finitely many types and is 
irreducible. Let M be the mean offspring matrix of 2* (A, B,X' e ). Note that whether or 
not an individual in a clique becomes infected is independent of that individual's own 
infectious period. It follows that the rows of M are each proportional to the probability 
mass function of X' e , so M has rank one and the maximal eigenvalue of M is given by its 
trace, which is easily seen to be equal to the left hand side of (4.6). Therefore, if > 1, the 
branching process Z^(A, B,X) dominates the irreducible finite- type supercritical branching 
process Z*(A, B, X' e ), which we know from standard theory [20, Theorem 4.2.2] has a strictly 
positive probability of survival. Thus p{x) > for all x G (0, oo]; equation (4.4) then implies 
that p > 0. 

For R* < 1 we use a similar argument to [10]. Suppose that R* < 1 and that p(x) > for 
some (and thus all) x G (0, oo]. Recall that $(p)(x) is the probability that, in Z*{A, B,X) 
and with individuals of type x being marked with probability p(x), an individual of type 
x has at least one marked child. Note that this probability is strictly smaller than the 
expectation of the number, Tm(x,p) say, of marked children of such an individual. Let 
T(x,m,X) denote the size of a single-clique epidemic with m initial susceptibles and a 
single initial infective which has infectious period x. Then, again exploiting the fact that 
whether or not an individual is infected is independent of its infectious period, we find that 

E[T M (x,p)} = E[A]E Y [E[T(x,Y,X)\Y}}E[p{X)}, 

whence, recalling (4.3), 

p(x) = $(p)(x) < E[A]Ey[E[T(x,Y,X)\Y]]E[p(X)}. (4.7) 

Note that if x is a realisation of a random variable Xq that is distributed as X, then 
E[T(m,X)} = E Xo [E[T(Xo,m,X)|X ]] and (4.7) implies that E[p{X)\ < R*E[p(X)]. It then 
follows that i2* > 1, which is a contradiction. Thus, if R* < 1 then p(x) is identically zero 
on the support of X and it then follows from (4.4) that p = 0. □ 



5 Proofs 

In this section we give formal proofs of Theorems 3.3 and 3.4. Recall the probability space 
(ft, F, v) defined in Section 2.2, where f2 is the product space of non-negative real-valued 
infinite sequences (Ai,i G N) and (Bj,j G N) and v is the appropriate (product) measure 
determined by the distributions of A and B. In the proofs we consider processes which 
depend on uj G Q, that is on the sequences (Ai,i G N) and (B i} i G N). The measure 
governing a process conditioned on u is denoted by P w and the corresponding expectation 
by E u . We use the notation X n Vv > X to denote that X n converges in probability to 

n— ¥oo 

X as n — y oo, with respect to the measure v. That is, X n Vv > X means that for every 

n— voc 

e > 0, 5 > 0, we have u(\X n —X\ > e) < 5 for all sufficiently large n G N. In particular, we 
often use the notation P^(X n G A) Pv > ¥(X G A), which is to be interpreted as meaning 

n— >oo 

that, for a subset A of the state space of X n and X, we have that for every e > 0, 

/ l(\F u (X n G A) — P(X G ^l)| > e)u{dw) ^0 as n -> oo. (5.1) 
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We prove the following conditioned versions of Theorems 3.3 and 3.4, in which TZ^ (u, X) 
denotes the set of ultimately recovered vertices, including the single initial infective, in an 
SIR epidemic (as defined in Section 2.3) on the random intersection graph G^ n \ constructed 
using the infectious period distribution X and the sequences (A4, iGN), (Bj, j G N) denoted 
by co G Q. 

Theorem 5.1. For k G N, we have 

F u (\KW(u,Z)\ = k) F(\Z f (A,B,X)\ = k). 

n— >oo 

Theorem 5.2. For every < e < p b (A, B,X), 

V u (\n- 1 \KW(u,Z)\-p?'(A,B,X)\ < e) -^p(A,B,X). 

Proofs of Theorems 3.3 and 3.4- Note that, for fixed k G N, the sequence of random vari- 
ables (¥ u (\TZy^(u,X)\ — k),n G N) is uniformly integrable, so Theorem 3.3 follows imme- 
diately from Theorem 5.1 (and [18, Theorem 7.10.3]), by taking expectations with respect 
to the measure v. Theorem 3.4 follows similarly from Theorem 5.2. □ 

5.1 Proof of Theorem 5.1 

In this proof we use three processes, 

• the branching process Z* = Z* (A, B,I), 

• the branching process = Z f {A {n \B^ n \X), defined similarly to Z f (A,B,X) but 
with A and B replaced respectively by A^ and B^ n \ defined in (2.7) and (2.8), 

• the exploration process of the Epidemic Generated Graph on G^ n \ denoted by 1Z^ = 

n^(io,x) = (n i o\n ( f ) ,■■■). 

In the exploration process, TZ^ denotes the initially infective vertex v , TZ^ denotes the 
subset of vertices in that in the Epidemic Generated Graph have an edge to them 

from vq, 7Z^ denotes the subset of vertices in \ {1Z^ U IZ^) that in the Epidemic 
Generated Graph have an edge to them from at least one member of 7Z[ n \ and so on. With 
slight abuse of notation we now use TZ^ for the exploration process, where previously it 
was the set of ultimately recovered vertices in £^ n \ As with the branching process zK 
|7^(™)| = ^°^ Q \1Z^ \ is the total number of ultimately recovered vertices; note that this has 
precisely the same meaning as in Section 3.4. 

To prove Theorem 5.1 we first show that the distribution of the total size of Z^ n ' is ap- 
proximately that of Zf, then that the distribution of the total size of 7lS n ) is approximately 
that of ZW. 

Lemma 5.3. For k G N, it holds that F W (\Z^\ = k) F(\Z^\ = k). 
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Proof. Recall that a litter in a branching process is a group of children corresponding with 
the number of individuals infected in a local epidemic in one clique, excluding the inintial 
susceptible. Let the total number of (possibly empty) litters in Z? and Z^ be denoted 
by H and # (n) , respectively Note that if X n X, then MV(X n ) => MV(X) [18, 
Theorem 7.2.19]. Recall further that =>• A and fiW £ as n — )■ oo. These latter 
convergence results also hold for the size-biased variants, as shown just below equation (2.1). 
It follows that, as n — > oo, the number and sizes of litters spawned by a typical individual 
in Z^ converge in distribution to those of a corresponding typical individual in Z? . Hence, 
for k G N and I G Z+, 

P w (|Z (n) | = k,H {n) = 1) -»L+W(\Z f \ = k,H = l). 

n— >oo 

Therefore, for every L G N, we have 

P w (|Z (n) | = k,H {n) < L) PflZ'l = k,H < L). 

Note that 

P w (|Z (n) | = k) = ¥ u (\Z {n) \ = k,H (n) < L) +P w (|Z (n) | = k,H {n) > L) 

and 

PudZ'l = k) = F„{\Z f \ = k,H<L)+ F u (\Z f \ = k,H> L). 

Now fix k G N and note that the probability of the intersection of the following events 
can be made arbitrarily close to by making c\ > and Ci > sufficiently small and LeN 
sufficiently large (L might depend on C2): 

(i) \Zf\ = k, 

(ii) H> L, 

(iii) the first vertices evaluated in the branching process Z? all have infectious periods 
larger than ci, and 

(iv) at least C2L out of the first L cliques evaluated in Z^ have size > 2. 

The probability that neither (iii) nor (iv) holds can also be made arbitrarily close to by 
tuning Ci and C2 (recall that P(X = 0) = 0). 

Combining these observations, it follows that for every e > 0, there exists L G N, such 
that for all I > L, 

F(\Z f \ = k)< ¥{\Z f \ = k,H < I) + e/3. (5.2) 

Note that the probability that (iii) does not hold is the same for Z^ and Z?; whilst given 
any S > 0, the fact that MV(B in ^) MV{B) implies that there exists N' G N such that 
the probability that (iv) does not hold is at most 5/3 for all n > N'. It then follows that 
for given e > 0, there exists V G N, such that for all I > L', 

z/(V w (|Z (n) | = k) < P„(|Z (n) | = k,H {n) < + e/3) > 1 -5/2 (5.3) 
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for all sufficiently large n. Now 

P w (|Z (n) | = k,H {n) < I) -^pflz'l = k,H <l) 

n— >oo 

implies that for all e > and 5 > 0, 

u(\F w (\Z^\ = k,H&> < l)-F{\Z f \ = k,H < l)\ < e/3) > 1-5/2, (5.4) 
for all sufficiently large n. Now, using the triangle inequality, 

|P w (|Z (n) | = k)-F(\Z f \ = k)\ < |P w (|^ (n) | = k) - P w (|Z (n) | = k,H (n) < l)\ 

+ |P u (|2: (n) | = k,H in) < l)-F(\Z f \ = k,H < l)\ 
+ |P(|Z / | = k)-F{\Z f \ = k,H < l)\ , 

whence, noting that the final term is independent of u, 

u(\F w (\Z^\ = k)-F(\Z f \ = k)\ >e) 

< v(w u {\Z^\ = k) > P w (|Z (n) | = k,H (n) < l) + e/3j 

+ z/(|P w (|Z (n) | = k,H {n) < l)-F(\Z f \ = k,H < l)\ > e/3) 
+ l(F(\Z f \ = k)> F(\Z f \ = k,H < I) + e/3). 

By choosing Z large enough, it follows, using (5.2), (5.3) and (5.4), that for all sufficiently 
large n, 

i/(|P w (|,Z< n >| = k) - F(\Z f \ = k)\ > e) < 5/2 + 5/2 + = 5 
and the lemma then follows. □ 
Lemma 5.4. For keN, P w (|£ (n) | < k) — P w (|7^ n )| < k) 0. 

Proof. The proof follows from a standard coupling argument, described below. Firstly 
though, for each n G N, let Vq be a vertex chosen uniformly at random from and 
let v £ , v 2 , ■ ■ ■ be independently chosen vertices from V^ n \ where the probability that a 
given vertex is chosen is proportional to its A- weight. Let Oq , cl\ , • • • be the respective A- 
weights of i>o , v ^\ ■ ■ ■ . Let Tq^ be the type assigned to vertex Let v'i n \v 2 , ■ ■ ■ be 
independently chosen vertices (representing cliques) from V'^ where the probability that 
a given vertex is chosen is proportional to its B- weight. The 5-weights of ,v 2 , ■ ■ ■ 
are denoted by b 2 n \ • • • , respectively. Let the random variable 

y( n ) = min(i G N : = for some j < i) 

be the smallest index at which a vertex from is chosen a second time. Similarly, define 

y'( n ) = min(i G N : v = v'j^ for some j < i). 
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The constructions of Z^ n ' and IZ^ are coupled as follows. The ancestor of Z^ spawns 
a V(a^) number of (possibly empty) litters, I' say. The cliques that the initial infective 
in belongs to are given by v'i , v'^, ■ ■ ■ , v'p . , which might contain duplicates; 
the 5-weights associated with these litters are b^ l) , bf\ ■ ■ ■ , b f>. If T'W > V, then there 
are no duplicates amongst v ^ , v n \ ■ ■ ■ and the processes stay coupled. If not, the 

construction can be continued but the details are not important for our purposes. 

If the coupling continues the sizes of the litters (recall that litters are defined both for the 
epidemic process and the branching process) are then determined. For each % = 1, 2, • • • , I', 
the size of litter i is distributed as the number of initially susceptible individuals which are 
ultimately infected by a local epidemic in a group with one initially infectious individual, 
having infectious period 1^, and a V(b[ n ^) distributed number of initially susceptible 
individuals. The litter sizes are all independent. Say that the total number of vertices in 
the I' litters is I, then they get A- weights a[ \ • • • , af^ and types Z± > • • • ,2.[ n \ 

which are i.i.d. and distributed as X. If I < the coupling continues and the generation 
1 vertices are v[ , v\ , ■ ■ ■ , v\ n \ The coupling now proceeds in the obvious way. Note that 
in this construction we have not yet decided which vertices are in the same clique (of the 
random intersection graph) as but are not infected by the local epidemic. 

Let ffW be as in the proof of Lemma 5.3 and let if(* n ) be the corresponding number 
for lZ( n \ We need to prove that for fcGN and / 6 Z + , 

W U {\ZW\ = k,H&> = l)-F w {\n^ n) \ = k,H^ = I) 0, 

and then deduce the statement of the lemma as in the latter part of the proof of Lemma 5.3. 
Note that the coupling gives 

P w (|Z (n) | = k,H (n) = l,T (n) > k,T' {n) > I) (5.5) 
= P w (|ft (n) | = k,H { * n) = l,T {n) > k,T' {n) > I). 

Furthermore, letting C (n) (k,l) = {T^ < k} U {V^ < I}, we have 

P w (|Z (n) | = k,H {n) = l) = F^(\Z {n) \ = k,H M = Z,T (n) > k,T' {n) > I) 

+ F u (\zV>\ = k,Hto =l,C ( - n \k,l)). 

Note that the second term on the right hand side of this expression is bounded above by 

p w (cW(M))- 

Recall from Section 2.2 that \i = K[A] = aE[B] < oo, which implies that the total 
weight of vertices in with weight exceeding logra is //-almost surely o(n). (To show 
this, note that, since \i < oo, for any N > 0, 

n 

n- 1 AHA, > N) ^ E[A1(A > N)} as n ->■ oo 

i=l 

and EL41(^4 > N)\ — > as N — > oo.) A similar result holds for the weights of the vertices 
in V'H Hence, for every k, I 6 N, the probability that both max(a- : < i < k) < logra 
and max (6^ : 1 < j < I) < \ogn converges to 1 as n — > oo. Thus, the total weight of 
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the first k vertices and the first I litters chosen in the branching process is i/-almost surely 
O(logn). By a birthday problem argument we deduce that ¥ w (C^(l, k)) Pv > 0. (Note 

n— >oo 

that if M n (k) is the number of distinct pairs (i, j) with < i < j < k and = Vj n \ then 
under the above restrictions, E w [M n (fc)] < fc( - fc ~ 1 ^ jf£ Vv > 0). Thus, for every k, I G N, 

P w (|Z (n) | = k,H {n) = I) -F w {\Z^ n) \ = k,H {n) = Z,T (n) > k,T' (n) > I) 0. 
Similarly, we deduce that, again for all k, I G N, 

F u {\n^\ = k,H^ = I) -P„(|ft {n) | = k,H^ = 1,7™ > k,T' {n) > I) 0; 

which, together with (5.5), yields the lemma. □ 
Theorem 5.1 follows immediately by combining Lemmas 5.3 and 5.4. 



5.2 Proof of Theorem 5.2 

Before considering susceptibility sets and backward branching processes, we prove the fol- 
lowing extension of Lemma 5.3 which is required later in this section. 

Lemma 5.5. p(A (n \B^ n \X) p{A,B,X). 
Proof. For every k G Z + , define the random variable 

{2- k [2 k X\ ifX<2 fc , 
2 k ifXG[2 fe ,oo), 
oo if X = oo. 

That is, X k is a random variable which can take only finitely many values and for j = 
I 2 ■ ■ ■ A k — 1 

P(X fc = j2- k ) = P(X G [j2~ k , (j + 1)2-*)), 

while P(X fc = 2 k ) = P(X G [2 fc , oo)) and P(X fc = oo) = P(X = oo). It is clear that X k X 
as k — > oo and that X k is stochastically smaller than X k+1 for all k G Z + . 

For non-negative random variables X and Y, the function p(X, Y,X k ) is pointwise non- 
decreasing in k, since it is the survival probability of a branching process and (stochasti- 
cally) increasing the distribution of the infectious periods, and thus also of the offspring 
distribution, cannot decrease the survival probability of the process. By monotonicity we 
have that lim^^ p(X,Y,X k ) exists pointwise, and by the monotone convergence theorem 
this limit satisfies (4.3) for p(X, Y, X). By Lemma 5.3 we know that for every k G N, 
P w (|Z( n )| > k) P(|Z / | > k). This implies that for every e > and 5 > 0, there exists 

n— >oo 

N Q G N such that for n > N , we have 

u(p{A {n \B in \X) <p(A,B,X) + e) > 1-5/2. (5.6) 
Furthermore, for every e > 0, there exists K G N such that for k > K, we have 

p(A,B,X k ) >p(A,B,X)-e/2. 
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Similarly, for every e > 0, 5 > and fcGN, there exist JV^eN such that for n > Nk, we 
have 

^(p^U^,^) > p(A,5,X fc ) -e/2) > 1-5/2, 

while for every jfc G N (and co e Q), p(A i - n \ B^ n \l) > p{A { - n \B^ n \l k ). Combining these 
statements establishes that, for every e > and 5 > 0, there exists iVeN such that for all 
n > N, we have 

i/^W.BWI) >p(A5,Z)-e) > 1-6/2. 
Combining this with (5.6) completes the proof of the lemma. □ 

In order to prove Theorem 5.2, we investigate the susceptibility sets of two uniformly 
at random chosen vertices in the subgraph G^ n ' (of G^"'), which is defined as follows. Let 
be constructed from by ignoring all vertices in and V'^ n ' that have weights 
larger than log n and ignoring all edges that are incident to such vertices. The graph & n ' 
is constructed from A*™) in the same way that G^ n ' is constructed from A*™). 

We can create a realisation of A*™) as follows. Define the vertex sets V^ n ' = (u, G V^ n ' : 
At < logn) and V'^ = (uj G V"( n ) : < logn). Conditional upon the weights of the 

vertices in A^ n \ (i) vertices Vi G V™ and v'j G V'W share in a V(AiBj/(pn)) number 
of edges and (ii) the number of edges between distinct pairs of vertices are independent. 
Let 

£(») = A and (5.7) 

£/(») = ^ Sj (5. 8 ) 

Then the degree of vertex Vi G in A*™) is V(A i L'^ > / (fin)) and the degree of v'j G is 
V(BjL^ /(fin)). We construct from A^ an identically distributed copy of A*™ - ) by adding 
the vertices from V {n) \ V (n) and K ,(n) \ V l( - n) and, if v f G V {n) and G V ,(n) are not both in 
A.( n \ letting Vi and v'j share a V(A i Bj/(fj l n)) number of newly-added edges, independently 
of the number of edges between other vertices. 

We compute the probability that the susceptibility sets of two vertices in G^ n ' survive 
until at least generation 

t n = [log logn]. (5.9) 

(Note that, as n — > oo, if it survives, the total number of individuals in the branching 
process Z b (A, B,I) in generations 0, 1, ■ ■ ■ , t n is of order O(logn) and a standard coupling 
argument, similar to that in the proof of Lemma 5.4, shows that, with probability tending to 
1 as n — > oo, a susceptibility set process and its approximating branching process coincide 
over generations 0, 1, ■ ■ ■ , t n . Thus for large n, if the susceptibility set process survives until 
generation t n , its size will then be of order O(logn); cf. the discussion in Section 3.3.) 

Next, we show that, given any e > 0, there exists K G N such that the probability that 
the t n -th generation of an individual's susceptibility set is empty on & n > and the total 
size of its susceptibility set on exceeds K is less than e for all sufficiently large n; see 
Lemma 5.10. We then explore the forward process in G^ n \ where we ignore the vertices 
and cliques already explored in the two backward processes. We show that if the epidemic 
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size is not 0(1), then, with probability tending to 1 as n — > oo, it is Q(n). After this we 
attempt to connect the forward process with the generation t n vertices of the backward 
processes and show that, in the event of a large outbreak, the probability that at least 1 
of the vertices in generation t n of a susceptibility set (if this generation is not empty) is 
ultimately recovered converges to 1 as n — > oo. 

We construct a coupling of two independent branching processes and the susceptibility 
sets of Vi and v 2 in (which by exchangeability is equivalent to choosing two distinct 
vertices uniformly at random), assuming that Ai,A 2 < logn. We therefore define (cf. equa- 
tions (2.5)-(2.8)) A\ n) = Ail(Ai < \ogn)L'^/(fin) and B\ n) = B l l(B l < \ogn)L^ /(/in); 

and let cf = X^=i -"-(^ — l°g n ) an d cf = Yl\=i^-(-^i — logn). The random variables 
A^ and are defined by 

P^(i (n) <x) = \{1 <i<cf :A[ n) <x}\/cf (x>0) and 
F U (BW <x) = \{l<t< cf : A (n) < x}\/cf (x > 0). 

The processes through which the construction of the susceptibility set of Vi {i £ {1,2}) 
takes place are denoted by 

& = ^(1^,5^,1) = (SJ, j £ Z+). 
The two independent branching processes are 

Z b,i = z^\A^,B {n \Z), for i £ {1,2}, 
where A^ and B^ are as above. The corresponding susceptibility set processes in 

G (n) 

are denoted by S l for i £ {1,2}. When no confusion is possible, we sometimes suppress 
the reference to the starting vertex i. 
We use the following lemmas. 

Lemma 5.6. Let < e < 3/e — 1. For k £ N, let (Xi(k),i £ N) be a sequence of i.i.d. 
V((l + e) log k) random variables. Then, for every C > 0, 

P( max XAk) < 3 log A; ) -> 1 as k -> oo. 
Proof. Since e h = Si^o^V^ we nave ^ > k k e~ h . Then 



< 



j! jfei+« 

j= [3 log fc] 

j_ ~ ((1 + e) log ky 
k 1+e ^ jie-3 

j'=[31og/c] 



< Tvh E (( 1 + e ) e / 3 ) J 

£-l-e+3(l+log[l+e]-log3)_ 



j=["31og fc] 



3- (l + e)e 

The probability that none out of \Ck\ independent copies of X\(k) exceeds 3 log A; is thus 
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given by 



Ck 



(1 - F(X 1 (k) > 31ogA;)) LCfcJ > fl - _J__A;- 1 -^3(l+lo g [l + e]--lo g 3)^ 

> X _ Qk 3 l-l-e+3(l+Iog[l+e]-log3) 

3- (l + e)e 

_ 1 3(l+log[l+e]-log3)-e 

3-(l-e)e 

which converges to 1 as k — > oo, since 0<e<3/e — 1. □ 

Recall that the distance between two vertices in a graph is the number of edges in the 
shortest path connecting those vertices. 

Lemma 5.7. For v-almost all u G Q, the probability that the total number and the total 
weight of vertices within distance 2t n of the set {vi,v 2 } in are both smaller than n 1 / 3 
converges to 1 as n — > oo. 

Proof. All vertices in A^ have weight at most logn, so their degrees in AW are stochasti- 
cally dominated by i.i.d. V(\ogn max(l/ n ), L'^) / (fxn)) random variables. For every e > 0, 
we have by the strong law of large numbers that l(max(l/ n ), L'^ n ') / (/in) < 1 + e) 1 as 
n — > oo. We know by Lemma 5.6 that, with probability tending to 1 as n — > oo, none of the 
at most n + [an\ vertices in A*™) has degree exceeding 3 log n. So, using a straightforward 
branching process approximation, the number of vertices within graph distance 2t n of V\ 
and t>2 is, with probability tending to 1 as n — > oo, bounded above by 

2 1 ji 

2^(31ogn) fe = 0((31ogn) 2 '" +1 ). 

k=l 

Since 2t n + 1 = 2 [log log n\ + 1 < 2 log log n + 3, we have 
(31ogn) 2 '" +1 < (31ogn) 3+21oglogn 



(31ogn) 3 e 21oglogn(log3+loglogn) = o(n 1/3 /log 



n, 



so the total weight of the vertices is o(ra 1 / 3 ). □ 

For i e {1, 2}, let K % (t n ) be the set of vertices in within distance 2t n of Vi in 
and let K n (t n ) be the set of vertices in V'^ 1 ' within distance 2t n of Vi in A^ n \ Lemma 5.7 
implies that, with probability tending to 1 as n — > oo, none of the sets K l (t n ), K 2 (t n ), 
K a {t n ) and K' 2 (t n ) has total vertex or clique weight exceeding n 1 / 3 . Furthermore, with 
probability tending to 1 as n — > oo, the total number of vertices in K l {t n ) is less than n 1 ^ 3 . 
Conditioned on K 2 {t n ) having total weight less than n 1//3 and K l {t n ) containing less than 
n 1 / 3 vertices, the probability that and K 2 {t n ) share an edge is bounded above by 

1 — (1 — n 1 / 3 1 'L n ) nl/3 < n 2 / 3 /L n , which converges //-almost surely to as n — > oo. So, for 
//-almost all w e O, the P^-probability that K 1 and K 2 share a vertex converges to as 
n — > oo. Similarly, we deduce that for //-almost all oj e fl, the P^-probability that K' 1 and 
K' 2 share a clique converges to as n — > oo. 

Recall the definition of i?* from (3.1) and write .R* as R*{A,B,T) to show explicitly its 
dependence on the distributions of A, B and X. 
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Lemma 5.8. For < c < logi?* ; it holds that 

P w (|^|>(logn)°||^|>0)-^1. 

Proof. By Lemma 5.7 and standard coupling arguments, similar to those used in the proof 
of Lemma 5.4, we can replace S by the branching process Z b (A ( - n \ B^ n \X). 

a (n) 

For neN, let A* be & random variable having distribution function given by 

P„(^l n) <x) = su P P aJ (i (i) < x) (x € K) 

i>n 

and define B^ similarly. Observe that =>- A and B*^ =>- B as n — > oo. Furthermore, 
for all neN, A^f 1 (respectively, B^) is stochastically dominated by A[ n+lS) (respectively, 
-Bi n+1 ' ) ). Therefore R*(Ai n \ Bi n \x) is also stochastically increasing in n. By the Skorokhod 
representation theorem [18, Theorem 7.2.14] and the monotone convergence theorem we 
have that 

R.(A± n \Bl n \X)^+R.(A,B,Z). 

n— >oo 

In particular, there exists N = N(u) such that R^(A^\ £>i n \x) > e c , for every n > N. So, 
by [20, Theorem 2.7.1] it follows that 

P w (|^(iM,J3<»\Z)| > (logny)-F u (\Zl(A± n \Bl n \T)\ > 0) 0. 

n— >oo 

The second probability in this expression converges to p b (A, B,X) by [12, Lemma 4.1] and 
the lemma then follows by observing that \Z\ (A* , 5* ,X)\ is stochastically smaller than 

\zl(Aw,B( n \i)\. □ 

Up to now, we have investigated the behavior of the susceptibility sets of vertices in 
& n \ This is only an intermediate step before analyzing susceptibility sets in G^ n \ To 
make the connection between the two graphs we use the following two lemmas. 

Lemma 5.9. For k G N, 

P w (|5(i (n) ,5 (n) ,X)| = k)-W u (\S{A {n \B {n \T)\ = k) -^0. 

Proof. In order to simplify the notation we suppress the explicit dependence on A^ n \ B^ n > 
and X. We denote by S' 1 the set of cliques containing vertices in the susceptibility set S\ 
We prove that 

P u (|5| = k, \S'\ = I) - P w (|5| = k, \S'\ = I) 0, (5.10) 

n— >oo 

from which the lemma follows using similar arguments to those in the proof of Lemma 5.3, 
which are not repeated here. 

Recall that we can construct G^ n ' from G^ n \ by considering the vertices in \ 
and V'( n ^ \ V'^ and then connecting them in the usual way with each other and with 
vertices in and V'^ to obtain A^. As in the proof of Lemma 5.4, /i < oo implies 
that 

n 

Ail(Ai > log[n)) = L {n) - L {n) = o(n) //-almost surely. 
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Therefore, 

L (n)_j>) aj 



'+ as n — )■ oo. 



L(") 

This implies that 1 - L^/L^ converges in probability to 0. In particular there is an 
increasing sequence of natural numbers (pi,i G N) , such that for all n > pi, we have 
v{\ - i>)/Z>) < 4 _i ) > 1 - 2"\ Define the function f : N -> N by £(n) = 2 i if 
Pi < n < Pi + \. This function increases to infinity and 

1 (L (n) - L (n) < (£(7i))- 1 £ (n)> ) 1. (5.11) 

Similarly, there exists a function £'( n ) which increases to oo, such that 

1 (l'W - L' (n) < ^'(n))- 1 ^) 1. (5.12) 

Let (respectively, ) be the weight of the first k vertices from (respectively, 
V'W) explored in S. Note that 



P,„ 151 = jfe, IS'I = I 



^ ) ) >(ev)) 1/2 uL^>(e(n))^)-^o, 



since if the conditioning event occurs then the probability that the susceptibility set does 
not extend further goes to as n — > oo. It follows that 

(\S\ = k, \S'\ = I, < (£' W) 1/2 , l§ ) < (an)) 1/2 ) - P w (|5| = k, \S'\ = I) 

-^0. (5.13) 

Given to, when constructing the graph G^ n > from G^ n \ the expected number of newly-added 
edges between the first k vertices from V^ n > explored in S and V ^ \ V ^ is 

jp(n) _ (k)\ > 



fin 

(k) 



Suppose that < (£'W) 1/2 . Then 



*k < KK W) L/(n) ^ , 
which, together with (5.12) and the fact that L'( n >/(n[i) — ^ 1 as n — > oo, yields 

jf^fio < (an)) i/2 ) 0. 

Combining this, and a corresponding result for the number of newly-added edges between 
the first I vertices from V ^ explored in S and \ V^ n \ with (5.13) establishes that 

p w (\s\ = k, \s'\ = i, s n (r (n) \ v (n) ) ^ 0, 5' n (v ,(n) \ v /(n) ) ^ 0) 0, 

which completes the proof of (5.10) and thus of of the lemma. □ 
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Lemma 5.10. For every e > there exists K e N such that 



l(F UJ (\S tn (A^,B^\X)\=0,\S(A^,B^,Z)\>K)<e) 1. 

n— >oo 

Proof. For ease of presentation we suppress the dependence on the distributions of the 
weights and infectious periods, writing S for S(A^ n \ B^ n \X) and S for 5(i^,B' n \l). 
First note that, as in the proof of Lemma 5.8, we can use branching process approximations 
to show that for every K e N we have 

P u (|4j = 0, |«S| >K)-P w (|^(^ (n) >^ (n) ^)l =^\Z\M n \B^ n \X)\ >K) 

^^0. (5.14) 

Now, 

P w (|Zf n (i (n) ,5 (n) ,X)| =0, |Z b (i (n) ,5 (n) ,X)| >#) 
= F u) (\Z b {A^ n \M n \l)\ > K) 

-F w {\Z h tn (A {n \B [n \l)\ > 0,\Z b {A (n) ,B (n \l)\ > K) 
= F^(\Z\A^,B^,X)\> K)-F ul (\Zl(A^,B^,l)\>0), (5.15) 

for all sufficiently large n, since \Z b n (A^ n \ B^ n \X)\ > implies that \Z\A^ n \ B^ n \X)\ > t n . 
Arguing as in the proof of Lemma 5.3 shows that 

F w (\Z b (A^,M n \l)\> K) ^F UJ (\Z b (A,B,l)\> K). (5.16) 

To deal with the second term on the right hand side of (5.15), observe that 

P w (|2*(iW Z)| >0) 

= F ul {\Z b (AS n \B^ n \X)\ = oo) 

+ F ul (\Z b n (A {n) ,B^,l)\ > 0,\Z b (A^\B (n) ,X)\ < oo) 
<F^\Z b {A^\B^\X)\^oo)+F w {\Z b {A^ n \B^,I)\ e (t n ,oo)). (5.17) 

Now, given any e > 0, there exists L 6 N such that F(\Z b (A, B,l)\ e (L, oo)) < e. 
(If R* < 1 then \Z b \ is almost surely finite and the statement follows immediately. If 
R* > 1, the statement follows by writing F(\Z b \ e (L, oo)) = p b F(\Z b \ e (L, oo)\\Z b \ < oo) 
and using the fact that a supercritical Galton- Watson process conditioned on extinction 
is probabilistically equivalent to an associated subcritical Galton- Watson process [13].) 
Further, (5.16) and [12, Lemma 4.1] imply that 

F u (\Z b (A^,B^,l)\ E (L,oo))^->F(\Z b (A,B,l)\ e (L,oo)), 

n— >oo 

SO 

Hp u (\Z\A<- n \M n \X)\ e (L,oo)) < e) 1, 

n— >oo 

which implies that 

l(F w (\Z b (A^,M n \l)\ e (t n ,oc)) < e) 1. 

n— »oo 
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As this holds for any e > 0, it follows from (5.15), (5.16) and (5.17), with another application 
of [12, Lemma 4.1], that 

P u (|Z£(A (n) , & n \Z)\ = 0, \Z\A {n \B {n \Z)\ > if) 

-^F(\Z b (A,B,l)\e{K,oc)). (5.18) 

n— >oo 

Now F(\Z b (A, B,I)\ £ (if, oo)) can be made arbitrarily close to by choosing if 
sufficiently large. Thus (5.14) and (5.18) imply that, for every e > 0, we can choose if £ N 
such that 

= 0, |<S| > if) < e) 1. (5.19) 



Finally, note that 

p w (l4l = o>|5|>#) = P w (l4l = o)-P w (l4l = o,|5|<iir) 

= P w (|SJ=0)-P w (|S|<if) 
for all sufficiently large n. Similarly, since |«S| > \S\, 

P w (|4,| = 0, |5| > if) = P u (|4,| = 0) - P w (|5| < if) 
for all sufficiently large n. Hence, by Lemma 5.9, 

P w (|4j = 0, \S\ > if) - P w (|4,| = 0, |5| > if) 0, 

n— >oo 

whence the lemma follows from (5.19). □ 

For the remainder of the proof of Theorem 5.2, we re-analyze an exploration process of 
the forward epidemic process and we couple it to a multi-type branching process, such that 
the epidemic process is bigger than the branching process for as long as the total weight of 
both the vertices and the clique vertices in the exploration process is less than a predefined 
fraction of the total weight. The survival probability of this branching process can be made 
arbitrarily close to the probability of a large outbreak as n — > oo. After that we 'glue' the 
susceptibility sets, if they are large, to the forward epidemic process. 

We need some extra notation. Since the weights of the vertices are exchangeable, the 
model does not change if we order the vertices such that Af 1 ' < A\^ v and < B^ 1: for 
1 < i < n and 1 < j < [cm J . For 7 £ (0, 1), we define 

i? (n) ( 7 ) = inf ^ < n : ^g^' > 1 - 7 j and 
i?'<%) = inf (i < [an\ : > 1 - 7 



Furthermore, define 



7 = 7(7, n) = 1 — - and 

.if/(™>(7) 



7' = 7'( 7 ,n) = l J ^ /(w) 
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We claim that, for 7 6 (0,1), 7 v " > 7. This can be seen by the following reasoning. Let 

n— >oo 

x = inf (y > : /i _1 E[AL(A < y)] > 1 - 7/2). Then x is finite, since \i = E[A) < 00. By 
the strong law of large numbers, we have n _1 Y^i=i Ail(Ai < x) E[A1(A < x)} and 



n 



-1 



j^(n) ^ ag n _j, ^ Thus, 



Er =1 ^ < x) ^ ^ [AM < x)] > : _ 7/2 

as n — >■ 00, whence v(A R ( n ) < x) — > 1 as n — > 00. Combining this with 

1 - 7 = — ^— 1 > 1 - 7 

and 

vfl w (7 )_i 
l_7_^ = ^i=l ^<l- 7 

' L (n) ^ ' 

completes the proof of the claim. Similarly we can prove that 7' Vv > 7. This also shows 

n— >oo 

that the vertices in \ (respectively, V'( n > \ V'^) all have labels exceeding R^ n \^) 
(respectively, #'^(7)) with probability tending to 1 as n — > 00. 

For c\ > 0, let I(c\) be the set of vertices with type/infectious period less than c\. Let 
X(ci) denote a random variable having distribution function given by P(X(ci) < x) — P(I < 
x\X > ci), for x > c\. We use the multi-type branching process Z^(A^ n \ B^ n \X(cx), 7), 
which is obtained from Z^(A^ n \ B^ n \X(ci)) by: 

(i) Killing upon birth all children with A-weight strictly larger than the weight of vertex 
i?( n )(7). Children with A- weight equal to the weight of vertex R^^j) are killed 
independently with probability given by the fraction of those vertices in having 
weight equal to the weight of vertex R^'^) that also have label strictly larger than 
^(7). 

(ii) Killing upon birth all litters corresponding to local epidemics in cliques with B- 
weight strictly larger than the weight of vertex R'^ n \^). Cliques with 5-weight equal 
to the weight of clique R'^ n \^j) are killed independently with probability given by 
the fraction of those vertices in V'^ having 5-weight equal to the weight of clique 
R'( n '( , y) that also have label strictly larger than R'^fa). 

If Ax, Aii ' ' ' 1 A-n are distinct, which happens //-almost surely if the distribution of A has no 
atoms, then (i) reduces to killing upon birth all children with A-weight strictly larger than 
the weie ht of vertex R^(j). If B x , B 2 , ■ ■ ■ , B l ara j are distinct then (ii) simplifies similarly. 

We observe that the corresponding survival probability function (cf. Section 4) 
p(x; A^ n \ Z(ci), 7) increases as 7 | 0. Thus, the limit function, as 7 | 0, exists 
and satisfies (4.3) by the monotone convergence theorem. Invoking Lemma 4.1, this limit 
function is 

\\mp{x-A^\B^\T{cx), 1 )=p{x-A^\B^\l{cx)). 

74-0 

Similarly, since p(x; A^ n \ B^ n \X(ci)) is decreasing as ci j. 0, one can show that 

\imp{x-A {n \B {n \l{cx)) = p{x-A {n \B {n \l). 

ci4,o 
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For p(A^ n \B^ n \X) as in Section 4, this leads to the first assertion of the following lemma. 
The second assertion then follows using Lemma 5.5. 

Lemma 5.11. For every e > 0, u G Q and n G N, there exist 7 > and C\ > small 
enough such that 

|p(A^,5( n ),X( Cl ),7) - p(A^ n \B^ n \l)\ < e/2. 
For every e > 0, there exist 7 > and C\ > such that 

l(\p(A^,B^,X( Cl ) n )-p(A,B,I)\ < e) 1. 

n— >oo 

Let C\ > and 7 > be constants. We consider the forward epidemic process 
lZ( n \cu,Z, ci, 7/3), which is obtained from Tc^(jjj,X) by removing all vertices (and adjacent 
edges) in I(ci), K l {t n ) and K 2 {t n ) and not allowing for contacts in the cliques K' l {t n ) and 
K' 2 {t n ) or in cliques with label R'^^ /3) or larger. As before, we deduce that for every 
7 > and large enough n, all vertices in V'^ \ have label at least R'( n ^("f/3), with 
probability arbitrarily close to 1. Also define TZS n ' = 7^ n '°- ) = TZ(u,I, ci, 0) and let the 
total weight of the cliques in TZ^ (i.e. in the set of ultimately recovered vertices in TZy 1 ') 
be denoted by W /W (ci). 

Lemma 5.12. For every e > 0, there exist constants rj > and c\ > 0, such that 
l(P w (n;'W( Cl ) > V n) - (p(A,B,l) - e) > 0) 1. 

Proof. We explore 1zS n ^ vertex by vertex (and clique by clique) and couple this with an 
exploration process of the tree of the branching process 

Z (n,i) = z f {A {n) ,B^\l{ Cl ),-f). 

With some abuse of notation we use TZ^'^ and Z^ n ^ for the exploration processes as well. 

We choose one vertex uniformly at random from V^ n \ We assume that this vertex is 
not in K l (t n ) or K 2 (t n ) and that its type/infectious period exceeds c\. The probability 
that this assumption is met can be made arbitrarily close to 1 by choosing n large enough 
and Ci small enough. Denote this vertex by vq. Define the 'forbidden sets' of vertices by 

T = K\t n ) U K 2 (t n ) U J(ci) U {V in) \ V {n) ) U {v } and 
r' = K'\t n ) U K' 2 (t n ) U {v[ G VW : i > # ( %/3)}. 

For the vertices in \T , we re-randomize the infectious period in such a way that, for 
every vertex in \ T , we let it be an independent random variable with distribution 
X(ci). This will not affect the distribution of the processes. 

Let o-Q n \i) be a relabeling of the vertices in such that if Vj G r and Vi G \ T , 
then Of, (i) < 0q (j), while if Vi,Vj G V™ \ T , then a { n \i) < a { n \j) if % < j. The precise 
order of the labels of the vertices in the forbidden set is not important. Define a'o n \i) 
similarly. 

The A- weight and type of v$ are also assigned to the ancestor of Z^'^, say that the A- 
weight is do- Then we use a V(aoL'^ /(pn)) random variable, do, to denote the 'maximal' 
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number of cliques vertex vq is part of and, coupled to this, the 'maximal' number of child 
cliques the vertex has in Z^ na \ The meaning of maximal is clarified below. 

We now identify the first child clique. Choose a real number, x' say, uniformly at 
random from the unit interval. In 7?. <n ' 7 - ) we try to connect vertex vq to the clique with 
label i, which satisfies 

B 3 <x'L^< Yl B r 

jex-jM OX^"' (i) jefi-4 n) U)<4 n) 

Let this vertex be v[. The _B-weight of the corresponding possible litter in Z^ is B;, 
where % is such that X^=i Bk < x'L'^ < Yl)=i Bj- If v[ G r' , then the clique is ignored in 
TZS n,1 \ If x > 1 — 7, then the litter in Z^ n ^ is ignored. We note that as long as the weight 
of r' is less than ^L l<yn \ a clique can be ignored in TZ^ n '^ only if the corresponding litter in 
2,( n 'i) is also ignored. Furthermore, the 5-weight of the litter in Z^ n '^ is not larger than 
the .B-weight of the clique in TZ^'^. 

Let the label of v[ be k. We now define 

{^"'(i), for % such that a'^ii) < (jQ n \k), 

a} n \i) - 1, for % such that af\%) > a' {n \k), 
[an\ , for % = k. 

That is, we give v[ the maximal label and keep the order of the labels of the other vertices. 
Furthermore, we add v[ to the forbidden set, i.e. set T[ = T' Q U {v[}. We choose the next 
clique in j?} 11 ^ and corresponding litter in Z^ n,1 \ say v' 2 , in the same way as we choose v[, 
with ctq™' 1 replaced by a'^ and T' Q replaced by r' 1; and we continue this process until we 
have identified all cliques that vq is part of. 

We then pick one of the cliques added to TZP 1 ^ whose corresponding litter was not 
ignored in Z^ n,1 \ We realise a local epidemic in this group as follows. Assume that the 
B -weight of the clique is b±. Then let d\ be "P (biL^ j (fin)) . Consider a population with d± 
initial susceptible individuals and 1 initial infectious individual, all with infectious period 
distributed as X(ci), and couple two continuous time epidemics in this population as follows. 
Consider the first newly infected individual in this population. We associate this individual 
with vertices in TZ^'^ and in Z^ 1 ^ as follows. Choose a real number, say x, uniformly at 
random from the unit interval. In TZ^'^, we try to connect clique v[ to the vertex with 
label i, which satisfies 

Bj<xL^< J2 B r 

je^-4 n) (j)<4 n) (o j€N-4 n) u)<4 n) (i) 

Suppose that this vertex is v 2 - The A-weight of the possible child in Z^ 1 ^ is where % 
is such that Y?j=i A? < < Yl]=i Ar The vertex we choose is denoted by xi\. 

If v\ G r , then the vertex is ignored in TZP 1 ^ and immediately killed. If x > 1 — 7, then 
the child in Z^ n ^ is ignored. We note that as long as the weight of To is less than 'jL^, a 
vertex can be ignored in 7Z( n '^ only if the child in Z^ 1 ' 1 ^ is also ignored. Furthermore, the 
A-weight of the vertex in Z^ n '^ is not larger than the A- weight of the vertex in TZS 11 ^. We 
identify the other vertices infected by local epidemics started by v and the corresponding 
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children in Z^ n ^ as we have identified the cliques Vq is part of, where at each step the 
forbidden set of vertices might grow and the chosen vertex gets the highest label for the next 
vertex pick. The infectious period/type assigned to every vertex (which is not immediately 
killed) is distributed as X(ci) and coupled vertices get the same infectious period/type. We 
continue in this way until we have identified all vertices infected by local epidemics started 
by vq and we then explore the cliques those individuals are part of one by one, as before. 

The exploration process TZ^ n '^ dominates the exploration process Z 1 ^^ until the total 
weight of the forbidden set in V^ 1 in TZ^" 1 ' 7 ' is at least or the total weight of the 

forbidden set in V' {n) in TZ {n ' 7) is at least ^L ,{ - n \ 

Note that we may choose c\ > small enough such that P(X < ci) < 7/2. By the 
law of large numbers this implies that c\ > might be chosen such that the total weight 
of vertices in I(ci) is less than {^//2)L^ with probability tending to 1 as n — > 00. By 
Lemma 5.7, we know that the weights of K 1 , K 2 , K' 1 and K' 2 are each a.s. o(n) and we 
know that the set of vertices with label > R'^i^y/S) has total weight at least ( , y/3)L^ n ' 
and the probability that this total weight is is less than (j/2)L^ can be made arbitrary 
close to 1 by choosing n sufficiently large. 

If the ordering of the exploration processes 1Z^ and Z^ stops because the total 
weight of the forbidden set in V'^ n > exceeds r yL'( n \ then, using Lemma 5.11, the lemma is 
immediate with rj = 7/3. If this ordering stops because the total weight of the forbidden set 
in exceeds ^L^ n \ then the total weight of vertices in TZ^^ that are not in the original 
forbidden set exceeds (7/3)/^. We now proceed as follows. Since all of the vertices in V'^ 
have weight at most logn, the number of vertices with labels exceeding R l(jl \^/3) grows 
to infinity and, by the law of large numbers, we find that the total weight of cliques in this 
set which contain vertices in iZ^ n '^ is 0(n). This completes the proof of the lemma. □ 

Proof of Theorem 5.2. We use the notation of Lemma 5.12. Recall that jzS n > = 7^ n '°- ) 
and that TZ^ = TZ^ ] (oj ,X) is the set of ultimately infected vertices in a population of n 
individuals. 

We first provide bounds for 

n 

E^n^lTZ^l I W (n \ Cl ) > V n] = E^n- 1 ^ 1 ^ e TZ {n) ) | W /{n) (ci) > r?n] 

i=i 

= P w («! G 1Z {n) I W ,(n) (ci) >rjn) 

and for 

E^K 2 |^ (n) | 2 | W /(n) ( Cl ) > V n] 

n n 

= E^[n- 2 J^JL 1 ^ v i G ^ (n) ) I ^ ,(n) ( c i) > Vn] 
i=i j=i 

= n -1 P w («i G 1Z {n) | W ,(n) (ci) > rjn) 
+ (1 - n~ 1 )F UJ (v 1 ,v 2 G TZ {n) I W /(n) (ci) > rjn). 

Let e' > 0. By Lemma 5.8 and the asymptotic theory of supercritical general branching 
processes [24] modified to the lattice case, we have that, if the susceptibility set of v% 
in survives for t n = [log logn] generations, then there exists C2 > such that the 
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probability that the number and the total weight of the vertices in this generation is at 
least C2 log log n is greater than 1 — e' for all sufficiently large n. We denote the set of vertices 
in generation t n of this susceptibility set by vj^ . The same holds for the susceptibility set 
of V2- Furthermore, the events of survival up to generation t n of the two susceptibility sets 
are asymptotically independent by a birthday problem type of argument and Lemma 5.7. 

Conditioned on VV'( n '(ci) > r/n, the law of large numbers establishes that the following 
event occurs with probability exceeding 1 — e'. The number of vertices in that both (i) 
are in the same clique as an infected vertex explored in TZ^ and (ii) have infectious period 
at least c±, grows to infinity as n — » oo. Since each vertex in vj^ is infected independently 
with probability at least 1 — e~ Cl > 0, we have that 



1 



(F w (v! E TZ {n) | |<S t X J > 0, W ,(n) ( Cl ) > ryn) > 1 - 2e') 1. 



Furthermore, if the susceptibility set of V\ does not survive up to generation t n in & n \ then 
Lemma 5.10 shows that the probability that the initial infective is in V\S susceptibility set 
converges to 0. More precisely, for every K 6 N we have that 

< \ i , -i , \ P w («i e T& n \ \S} I = 0) 
y ' n ' P«(l^l = 0) 

P w («i e U {n \ IS 1 ] < if) +P W (|5 1 | > K, \S}\ = 0) 



< 



p*(I#J=o) 



The first term in the numerator of the right hand side of this inequality converges to as 
n — > oo, while by Lemma 5.10 we have that, for every e > and 5 > 0, there exists K e N 
such that the second term in the numerator is smaller than e with ^-probability at least 
1 — 5 for all sufficiently large n. The denominator is trivially strictly positive. We therefore 
conclude that 

FJvt E TZ {n) I I& 1 | = 0) 0. 
From the proof of Lemma 5.12 we deduce that 

P W ($J > 0) - F w (\Sl\ > | VV'W( Cl ) > ^n) 0, 

whence 

FJ Vl e TZ {n) I W /{n) (ci) > rjn) ~ WJ\S} \ > 0) 0. 
Now, arguing as at the start of the proof of Lemma 5.8, 

F w (\Sl\ > 0)-F^\Z b tn (A^,B^,X)\ > 0) 0, 

n->oo 

whilst the end of the proof of Lemma 5.8 shows that 

W u (\Zl{A< n \BM,Z)\ >0) -J^ P \A,B,Z). 

n->oo 

Thus, FJ\S} | > 0) p b (A,B,l), whence 

E^rT 1 ^! I W /(n) (ci) > 7?n] -^/(4,5,I). 
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Since the first t n generations of the susceptibility sets of V\ and t> 2 in are non- 
overlapping with probability tending to 1 as n — > oo, we notice that 

v 2 G I W /(n) (ci) > r?n) - (P^ G I W ,(n) ( Cl ) > r]n)) 2 0. 
This gives that 

E^n-^TZ^l 2 I W ,(n) ( Cl ) > 7771] (/(AB,I)) 2 . 

n— »oo 

Therefore, var(n _1 |7?.^ n - ) | I VV /(n )(ci) > p " > and we conclude that, for all 5 > 0, 

Puflrr 1 ^ -p 6 (A,5,X)| < S\ W ,(n) (ci) > »7n) 1. (5.20) 

On the other hand, we know by Lemma 5.12 that for every e' > 0, there exist constants 
r] > and Ci > such that 

l(P w (W ,(n) (ci) > tjti) > p(A,B,X) - e) 1. (5.21) 

n— i>oo 

Furthermore, by Theorem 5.1 there exists fceN such that 

k 

l(y> w (|ft< n >| =i) >l-p(A,B,l)-e') 1. (5.22) 

t=l 

Now observe that 

P w (^i G ft (n) , W ,(n) (ci) < r)n) < ¥ u (v! G ft (n) , \H {n) \ < k) 

+ P w (W /(n) (ci) <r]n, \n^\ > k). 

By exchangeability, the first term on the right hand side of this inequality is bounded above 
by k/n which converges to as n — > oo. Further, for any K G N, 



P^(W /(n) (ci) > rjn, \TZ {n) \ < K) 0, 

n— >co 

so (5.21) and (5.22) imply that for every e > 0, there exists fceN such that 

l(P w (>V'M( Cl ) < > k) < e) 1. 

It follows that 



E^n- 1 !^! I W /{n) (ci) < Tjn] 0, 
so for every 5 > we have 

P^rT 1 !^! < 5 I W ,(n) (ci) < r/n)) 1. (5.23) 

n— >oo 

Combining (5.20) and (5.23) completes the proof of Theorem 5.2. □ 
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6 Extension 



In this paper we study the spread of an SIR epidemic on a random intersection graph. A 
variant of the random intersection graph is proposed in [26], where a configuration model 
construction is used to create the graph. In our terminology and notation, independent 
degrees are assigned to vertices in V and V, where the degrees of vertices in V are each 
distributed as a random variable D and the degrees of vertices in V are each distributed 
as a random variable H . Each vertex in V U V is assigned a number of half-edges given 
by its degree. In the auxiliary graph A^ ra ^ the half-edges of the first n vertices in V are 
paired uniformly at random with the first L^ n ' half-edges in V, where L^ n ' is the number 
of half-edges assigned to the first n vertices in V. Note that the final vertex in V used in 
this construction might not retain its full degree in A*™). 

The forward and backward branching processes can be modified in the obvious fashion 
to this setting and equivalent formulae to the key expressions (B.6), (B.7) and (B.8) in Ap- 
pendix B.2 can be derived, thus facilitating calculation of the threshold parameter and 
survival probabilities of these branching processes. We expect that, under mild conditions 
on the distributions of D and H, theorems corresponding to Theorems 3.3-3.5 hold for this 
model. Some additional dependencies arise since connecting to a vertex takes away one 
of its available half-edges, however we anticipate that the impact of those dependencies is 
very small. 

A Proof of Lemma 4.1 

In order to prove Lemma 4.1 we use an idea from Riordan [33]. He considers the cor- 
responding problem for a class of multitype branching processes having type space (0, 1] 
in which, in crude terms, the number of children having type in any specified interval an 
individual of type x has tends to infinity as x \. 0. We cannot use the result in [33] directly 
because the number of children an individual of type x has tends to zero as x J, 0. How- 
ever, we can apply the idea in [33] to a branching process that is intimately related to Z* , 
which we now describe, and exploit a connection between the functional $(p)(x) and an 
equivalent functional for the new branching process to obtain the desired result. 

Recall that in the branching process Z* , individuals arise in litters, with a litter being 
distributed as the set of individuals that are infected in a local (single-clique) epidemic, not 
including the individual who triggers that local epidemic. Consider such a local epidemic 
and suppose that the clique contains the initial infective, i* say, and m susceptible indi- 
viduals. The final outcome of the local epidemic can be obtained using the corresponding 
Epidemic Generated Graph, by first determining the number of individuals, a say, that 
are contacted directly by the initial infective, and then considering the epidemic, £ s ^ a say, 
triggered by those a individuals among the remaining s = m — a susceptibles in the clique. 
Suppose that the epidemic £ s ^ a infects T s>a individuals, in addition to its a initial infectives. 
(Thus, in the notation of Section 3.2, T{m) = a + T SA .) Note that the infectious periods 
of the a initial infectives in E SA are i.i.d. copies of X and also that, conditional upon the 
value of (s,a), such epidemics in different cliques are mutually independent, even if they 
arise from the same initial infective i*. Thus the epidemic may be approximated by 
a branching process of litters, in which each litter is typed by its value of (s, a) and its 
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offspring are the litters triggered by the a + T s ^ a infectives in the corresponding £ Sja . Let 
Z* be the branching process derived in this fashion corresponding to the branching process 
Z? . Clearly, litters with a = are superfluous, so the type space for Z? may be taken to 
be T = {(s,a) : s G Z + , a G N}. 

We now derive the next-generation functional (i.e. the analogue of associated 
with Z* . For notational convenience we assume that X has an absolutely continuous distri- 
bution, though this is not essential and the argument (and the proof of Lemma 4.1 below) 
can be extended to the general case. Let h(s, a) : T — > [0, 1] be a measurable test function 
and suppose that litters are marked independently with a dagger (to distinguish from the 
marks used on Z*), with a litter of type (s, a) being marked with probability h(s, a). Let 
$(/i)(s,a) be the probability that a litter of type (s, a) directly spawns at least one litter 
that is marked with a dagger. 

Consider the epidemic £ s>a described above and suppose that T s a = k. Let x_ a +i, x_ a+2 , • • • , 
and Xi, X2, • • • ,%k denote the lengths of the infectious periods of the a initial infectives and 
the k subsequently infected individuals, respectively. Let p s , a (k; X- a +i, X- a +2, • • ■ , xo, X\, • • • ,x k 
be the probability density that T Sja = k and the infectious periods are given by 2- a +i, • • • , Xk- 
Then, 

$(h)(s,a) = 1 - V" / p a>a (k;x- a+1 , ■ ■ ■ ,x k ) J| P k (xi) dx^ a+1 ■ ■ ■ dx k , (A.l) 

fc=0 V(0,oo]«+* i= _ a+1 

where P%{x) is the probability that an individual, i* say, having infectious period of length 
x, does not spawn a litter which is marked with a dagger. 

To determine P%{x), note first that i* belongs to X ~ AiV(A) cliques, not counting the 
clique it was infected through, and consider one such clique. Besides i*, this clique contains 
Y ~ M.V{B) individuals. Suppose that B = b, then Y ~ V{b) and these Y individuals are 
infected independently by i*, each with probability 1 — e x . Thus, given B = b, the litter 
has type (s, a), where s and a are independent realisations of the Poisson random variables 
V(be x ) and V(b(l — e x )), respectively. Hence, the unconditional probability that this litter 
is not marked with a dagger is 



E 



s=0 a =0 



where h(s, 0) = (s e Z + ). Given that i* has infectious period x, the local epidemics it 
initiates in the above X cliques are independent, so 




1^1^—1 ^ e 



s! a! 

5=0 a=0 



(A.2) 



Let p(s, a) be the survival probability of the branching process zf given that the 
initial litter has type (s, a). Then p is the maximal solution of p(s,a) = $(p)(s, a). If 
either s — > oo or a — > oo, then for any (s f , a') G T and any K G N, the probability that a 
type-(s, a) individual has at least K type-(s', a') children in the next generation tends to 1. 
Furthermore, it is easy to deduce that for any (s, a), (s 1 , a 1 ) G T, the number of type-(s', a') 
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children an individual of type (s, a) begets is non-zero with positive probability, so Z* is 
irreducible. Using the same argument as in [33, pp. 911-912], we conclude that there is at 
most one non-zero solution of p(s,a) = a). 

Recall that Lemma 4.1 states that there is at most one non-zero solution p(x) of the 
functional equation p(x) = $(/5)(x). To prove this it is useful to derive an alternative 
expression for Suppose that the initial ancestor, i* say, in Z? has infectious 

period of length x. By conditioning on the size of and the number of people directly 
infected by i* in a given clique, the probability that i* has no marked child in that clique 
is given by 



E 



EE 

s=0 a=0 



(e- x B) s ((1 - e- x )B) a b a/ 



si 



al 



(A.3) 



where 



A(s,a,h) = ^2 Ps,a(k;x- a+1 , ■ ■ ■ ,x k ) Yl ( 1 _ h(xi))dx- a+1 ■ ■ -dx k , (A.4) 

J(Q.°°] a + k i=-a+l 



k=0 



whence, since i* belongs to X ~ MV(A) further cliques (in addition to the one it was 
infected through), 



E 



s=0 a=0 



$(h)(x) = l-q 
Suppose that 
Then (A. 5) and (A.4) imply that 

A(s,a,h) = ^2 p s ,a(k;x- a 



a I 



h{x) = ${h){x). 



(A.5) 



(A.6) 



k=0 



n ■ 

i=— a+l 



E 



+ 1, " " " ; %k) x 

^ (e^ N (l-, : )» ( . 8(1 _ %a)) 

dx 



Si=0 a;=0 



a+l 



■ ■ dx k . 



Thus, by (A.l) and (A. 2), if h is treated as fixed, h(s, a) = 1 — A(s, a, h) satisfies 

h(s,a) = $(h)(s,a). (A.7) 

Let h be a non-zero (i.e. not identically zero) solution of (A.6), assuming such a solution 
exists. Then h must be the unique non-zero solution of (A.7), p say. (Note that if h is 
identically zero then (A.5) and (A.6) imply that h is identically zero.) Thus h(s,a) = 
1 — A(s, a, h) is independent of h, and h(x) is given by the right hand side of (A.5) with 
A(s, a, h) replaced by 1 — p(s, a), which proves the lemma. 
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B Calculation of properties of forward and backward 
branching processes 

In this appendix we give expressions for properties of the forward and backward branching 
processes, Z^ and Z b , which enable the threshold parameter and the survival proba- 
bilities p and p b which appear in Theorem 3.5 to be computed. These expressions rest on 
results for the final outcome of homogeneously mixing SIR epidemic models. In a series of 
papers, see for example [31], Lefevre and Picard showed that many quantities related to 
the final outcome of an SIR epidemic can be expressed compactly in terms of Gontcharoff 
polynomials, and these were extended by Ball and O'Neill [6] to include so-called general 
final state random variables. The latter are required to compute functionals associated 
with the forward branching process Z* . Results for homogeneously mixing SIR epidemic 
models are outlined in Section B.l and their application to computing properties of Z^ and 
Z h is described in Section B.2. 

B.l Results for homogeneously mixing populations 

In this section we give a restatement of Theorem 4.2 from Ball and O'Neill [6], adapted 
to the purposes of this paper (cf. [8]). We note that Ball and O'Neill provide appreciably 
more general results than their Theorem 4.2. In order to state the theorem, we need the 
following notation. We consider an SIR epidemic in a homogeneously mixing population 
with s initial susceptible individuals and a initial infectious individuals. The initial sus- 
ceptible individuals are labeled 1, 2, ■ ■ ■ , s and the initial infectious individuals have labels 
—a + 1, —a + 2, • • • ,0. The random variable Xj represents the infectious period that indi- 
vidual i will have if it becomes infected. Thus, the probability that individual i, if infected, 
ultimately has an infectious contact with individual j is 1 — e~ x \ (As before, infectious con- 
tacts between pairs of individuals are governed by independent unit-rate Poisson processes.) 
We assume that the random variables (Xj, i = —a + 1, —a + 2, • • • , s) are independent and 
all distributed as X; they are also independent of the Poisson processes describing infec- 
tious contacts. Note that this model is the epidemic £ SA introduced in Appendix A. Let 
h(x) : (0, oo] — > [0, oo] be a measurable function (the relevant measures are clear from the 
context) and 9 > 0. Furthermore, let 

U = (j(h, 9) = (ui{h, 6),ieZ+) = (u h i G Z+) 

be an infinite vector, where = K[e~ kI e~ 9h ^]. Let 1Z be the set of ultimately recovered 
individuals in £ m>a , including the initial infectives as well as any initial susceptibles that 
become infected. 

The Gontcharoff polynomials G m (x\U),m G Z+, are defined recursively by 

ml (m — k)\ 

for m G Z + . We note that G m (x\U) is a polynomial of order m, which depends on 
uq, Ui, ■ ■ ■ ,u m -i. Some properties of Gontcharoff polynomials are mentioned in Section 2 
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of [6]. In this paper we use only (B.l) and 

G m (x\U) = / / ■■■/ de m -i---d6^o, (B.2) 

J UQ J U\ Ju m — i 

for m G Z + . The following theorem is a special case of Theorem 4.2 in [6], which allows h 
to be random. 

Theorem B.l. ForIZ, h and U as above, we have 

E ^+a-\n\ e -ej: t ^h(x^ = s - (u k ) s - k+a G k {x\U). 

We use the following corollary of this theorem. 

Corollary B.2. Let U = U(h) = (ui(h),i G Z+) = (u h i G Z+), w/iere Uf = E[e" a (l - 
h{X))} and h(x) : (0, oo] — > [0,1] Borel-measurable, and let 11 be as above. Then 

Etna - = £ 7^WK)^ fc+a G fc (l|t/). (B.3) 

Proof. Set x = # = 1 and h = — log(l — /i) in Theorem B.l. □ 

Recall the random variable T(m) introduced in Section 3.2. In the present notation, 
T{m) is the size of the epidemic £ m ,i, not including the initial infective. The mean of 
T{m) can be expressed in terms of Gontcharoff polynomials as follows (see e.g. [4, equation 
(3.6)]): 

E[T(m)] = m-£ f ™" (i^i^^Ofc-xaiV) (m = 1, 2, • • • ), (B.4) 
k=i ^ m >' 

where v k = E[e~^ k+1)x ] and V = {v h i G Z+). 

The distribution of the size of the local susceptibility set of an individual can also be 
expressed using Gontcharoff polynomials. Recall from Section 3.3 that S(m) is the size of 
the local susceptibility set of an individual in a clique of size m + 1, where S(m) does not 
include the individual in question. As in [8, Section 3], we have 

777 ' 

F(S(m) = k) = — (v k ) m - k G k (l\V) (k = 0, 1, • • • , m), (B.5) 

(m — k)\ 

where v k and V are as in (B.4). 

B.2 Application to branching processes and Z h 

Let h be as in Corollary B.2 and suppose that individuals in £ s ^ a are marked indepen- 
dently, with individual % being marked with probability h(Ij) (i = —a + 1, —a + 2, • • • , s). 
Then (B.3) gives the probability that the epidemic £ Sy0j contains no marked infective. Re- 
call from Section 4 that F(h)(x) is the probability that the initial ancestor in Z* has at 
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least one marked child arising from the local epidemic in a given clique. Arguing as in the 
derivation of (A. 3) gives, after repeatedly using Fubini's theorem (note that G k (l\U) > 
for all k, using (B.2) and the fact that {u k e [0, 1]) is decreasing in k), 



1 - F(h)(x) 



E 



E 



E 



E 



*B S (1 - e~ x ) a B a 



EE 

s=0 a=0 

oo s 

EE 

,s=0 k=0 

oo oo 

EE 

,fc=0 s=k 

oo 

^2 e - xk gk e -Y(l-u k ) G k (i\U) 



-B 



E 

k=0 



SI 



s-k)\ 



(Uk 



iS—k+a 



G k (l\U) 



-( Uk y- k G^iiu^ 1 -^ 1 -*-^ 



k)\ 



s-k)\ 

{u k ) s - k G k {l\U)e-^ l - Uk{l ~ e ~ x)) 



k=0 



EH 

fe=0 



Ui ] ( 



l-u k )G k (l\U), 



(B.6) 



(k) 

where (jr^ is the /cth derivative of (j)^. 



Finally, we derive expressions for Ey[E[T(E)|E]] and Ey [f s(y)\y( s )]i where Y ~ M.V(B), 
which are required to compute and p b , see (3.1) and (3.5), respectively. Recall that 
(Y\B = b) ~ V(b) and E[Y] = E[B]. Thus conditioning on B and using (B.4) yields 



E y [E[T(E)|E]] = E[B] -E 



E 



B' 



ml 



-B 



»n+l— k 



Gf fc _i(i|y) 



_m=l fc= 

Interchanging the order of summation then yields, after elementary algebra, that 

oo 

E t [E[T(Y)\Y}} = E[B] - ^^^(-1)^(1 - v k ^) G k (l\V). (B.7) 

fc=i 

Turning to the size of the local susceptibility set of an individual in a typical clique, 
first note that conditioning on B and using (B.5) gives, for k G Z + , 



F(S(Y) = k) 



E 



E 

,m=k 



B' 



-B 



ml 



ml (m — k)\ 



v k ) m - k G k (l\V) 



whence 



S(Y)\Y 



E \B k e~ B{1 - Vk) G k {l\V)^ , 

oo 

s)] = X)(-*)V2 ) (l-«*)G*(l|V). 



(B.8) 



fc=0 
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